Re: Documentation / Resources to help getting started with Elasticsearch Adapter

2023-07-07 Thread xiong duan
Hi  Sumanth,

As Alessandro mentioned, the document will show how to use the sqlline to
search the Elasticsearch.
If you have the Calcite Code, we can find some test in Elasticsearch module.
And If you want to create Jdbc connection, you can try add the
Elasticsearch-Schema-File as the connection property "model" value.

Alessandro Solimando  于2023年7月7日周五 05:28写道:

> Hello Joe, Sumanth,
>
> You can subscribe to dev the mailing list by following the instructions at
> this link: https://calcite.apache.org/develop/#getting-started
>
> The devolop section will also provide more information on how to set things
> up and get started with Calcite.
>
> For some basic documentation around the ES adapter you can refer to
> https://calcite.apache.org/docs/elasticsearch_adapter.html
>
> The code itself and unit tests for the adapter provide additional
> information, although it will be a bit harder to digest if you are not that
> familiar with Java yet.
>
> Welcome to Calcite!
>
> Best regards,
> Alessandro
>
>
>
> On Thu 6 Jul 2023, 17:23 Joe Talafous,  wrote:
>
> > Hi Calcite Devs,
> >
> > Please keep me in the loop on this thread.  I am an experienced Java dev,
> > but just joined and need to learn Calcite and Apache tools.  This thread
> > looks perfect for me to shadow and learn, plus help Sumanth with ES and
> > Java.
> >
> > For beginners, where is the proper place to read these emails?   I can
> > read https://lists.apache.org/list.html?dev@calcite.apache.org but
> cannot
> > login into Pony Mail (see attached), should I be able to?  Right now I am
> > replying from my gmail.
> >
> > I have been granted Jira access by Francis and receiving github messages.
> >
> > Is there an informal chat room?
> >
> > There will be glaring gaps in my knowledge of tools and processes that I
> > am not aware, so please let me know.
> >
> > Looking forward to my journey towards Calcite excellence!!
> > Joe
> >
> > On Thu, Jul 6, 2023 at 9:59 AM Sumanth  wrote:
> >
> >> Hi Calcite Devs,
> >>
> >> I am planning on using Apache Calcite to connect with ElasticSearch.
> >>
> >> I was not able to find documentation for the same in Apache Calcite
> >> Website
> >> apart from documentation explaining connecting to ElasticSearch using
> >> sqlline (Which i was facing errors and was not able to rectify).
> >>
> >> If possible, Could you please share any / all getting started resources
> on
> >> onnecting to ElasticSearch programmatically or any existing code sampled
> >> that we can use to connect to ElasticSearch.
> >>
> >> I am new to Java and it was hard to infer the Calcite Elasticsearch
> >> Adapter
> >> Code to connect. It would be really helpful if I could get a tutorial
> >> document on connecting to ElasticSearch programmatically.
> >>
> >> Thanks in advance for your help and all the work you are doing with
> >> Calcite
> >> and I will make sure to share these resources with the community if it
> >> helps me. So others with the same issue can refer to it.
> >>
> >>
> >>
> >> --
> >> *Kind Regards*
> >> *Sumanth*
> >>
> >
>


Re: Java Doc about RexProgramBuilderBase

2023-07-07 Thread Zhe Hu
Thank for your clarification! I’ll try to fix it shortly.




 Replied Message 
| From | Julian Hyde |
| Date | 07/8/2023 02:16 |
| To |  |
| Subject | Re: Java Doc about RexProgramBuilderBase |
People refactoring code should remember that their IDE can move and rename 
fields but is not so good at changing documentation. (Maybe in a couple of 
years, with advances in generative AI?!)

And when documentation and code don’t line up, people don’t trust either. (I’m 
as guilty of this as anyone.)

On Jul 7, 2023, at 5:31 AM, Benchao Li  wrote:

This usually happens. When javadoc and the implementation diverges, the
javadoc is mostly possible wrong and need to be improved to match the real
behavior.

For this specific case, I agree with you that the javadoc for
`vParamNotNull` and `vDecimal(int arg)` are not correct, please fix them.
(We can do this kind of trivial work without a Jira ticket)

Zhe Hu  于2023年7月7日周五 14:45写道:

Hi community.
Recently, when I review CALCITE-5769(
https://github.com/apache/calcite/pull/3296), I found something a little
confusing.

First, the java doc in RexProgramBuilderBase.vParamNotNull(), which meant
to create non-nullable variable, but it’s returning description is
“nullable varchar variable”.
Second, we use vDecimal(int arg) to create nullable decimal variable, but
the RelDataType we pass in is “nonNullableDecimal”, which I think should be
“nullableDecimal”. So does the other vXxx() methods.
I’m not sure if I understand right here. If it’s something we can improve,
I’ll file a JIRA case to record and fix it.


Best regards,
Zhe Hu



--

Best,
Benchao Li


RE: Re: Calcite for Lineage

2023-07-07 Thread mbudiu
Which parser are you using, the server (DDL) or Babel?
I think DATEADD exists only in Babel.
To solve this problem in our project we have combined both parsers into one,
so now we can parse Babel + DDL.
https://github.com/feldera/dbsp/pull/276

Mihai

-Original Message-
From: Nathaniel Vala 
Sent: Thursday, July 06, 2023 9:13 PM
To: dev@calcite.apache.org
Subject: RE: Re: Calcite for Lineage

Hi Askar and Mihai,

Thanks for the help. I have modified the code to use a DDL parser, Minhai
was right about the Babel one not having all the DDL statements in it. I
have managed to parse this but still falling over at token that should
parse.

When parsing queries using `SqlDdlParserImpl.FACTORY` and `Lex.SQL_SERVER` I
try to parse a simple DATEADD function and it's failing.

```sql
SELECT  DATEADD(year, 1, '2017/08/25') AS DateAdd FROM unienrollements t1
```

Java throws an `InvocationTargetException` near 'YEAR' and that it was
expecting  However in that list, it has YEAR and YEAR (. To
me, this should be able to parse, have I missed something in the config to
enable this?

If you need some code, this can be found here:
https://github.com/Spydernaz/sqlLineage


Furthermore, thanks for the suggestion regarding the RelMetadata but for the
life of me cannot config it. Seems like I need to have a full connection
established and fetch live schemas for it to validate against? This isn't
really an option for me at the moment. I will upload the RelMetadata attempt
in a separate branch of this repo

On 2023/06/23 17:12:19 Askar Bozcan wrote:
> Oh, true. It can parse CREATE TABLE however.
> CREATE TABLE is pretty much the only DDL I needed to parse, so I 
> didn't really notice this.
> Thanks for the correction!
>
> - Askar
>
>
>
>
> On Fri, 23 Jun 2023 at 19:43, mailto:mb...@gmail.com>>
wrote:
>
> > From my experience the Babel parser does NOT include DDL - at least 
> > not in the released versions.
> >
> > Mihai
> >
> > -Original Message-
> > From: Askar Bozcan
> > Sent: Friday, June 23, 2023 1:14 AM
> > To: dev@calcite.apache.org
> > Subject: Re: Calcite for Lineage
> >
> > Hey Nathaniel,
> > To parse DDL statements, you need to use a different parser (see 
> > SqlParser.Config.withParserFactory
> > <
> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/p
> > arser/SqlParser.Config.html#withParserFactory(org.apache.calcite.sql
> > .parser.SqlParserImplFactory)
> > >)
> > because core Calcite parser
> > <
> > https://github.com/apache/calcite/blob/main/core/src/main/codegen/te
> > mplates/Parser.jj
> > >
> > does
> > not support DDL statements by design. You have two options:
> > 1) Use SqlDdlParserImpl.FACTORY (from calcite-server package, as 
> > you've
> > said)
> > This is an extended parser that can also parse DDL queries.
> >
> > 2)  Use SqlBabelParserImpl.FACTORY (from calcite-babel package) This 
> > is also an extended parser that can not only parse DDL queries, but 
> > also many extra things not present in ISO SQL standard, such as 
> > Postgre's infix CAST operator *::* Since you're doing lineage 
> > generation I highly recommend using babel package for maximum
compatibility with different DBs' queries.
> >
> > *Mini-explanation on how parsing works in Calcite* Try Go To'ing to 
> > SqlParserImpl code in your IDE. You're going to see a huge file, 
> > full of almost nonsensical if's.
> > The reason is simple: Parser code is build-time generated Java code 
> > based on rules defined in core Calcite parser < 
> > https://github.com/apache/calcite/blob/main/core/src/main/codegen/te
> > mplates/Parser.jj
> > >.
> > Parser.jj is a JavaCC file; JavaCC 
> >  > 3e> is a parser generator that, based on rules you define, generates a
pure Java code which can parse LL(k) grammars and generate the parse tree
(SqlNode).
> >
> > Parser.jj, as seen on the repo, is not a pure JavaCC file however, 
> > but an Apache FreeMarker template. There are strings in the 
> > Parser.jj file that start with *${ .* Those are placeholders used by 
> > Apache FreeMarker, a templating engine.
> > Extended parsers (babel, ddlparser) use those placeholders to insert 
> > their custom parsing rules without directly affecting the core 
> > parser file (it all still happens build-time, however).
> > How Babel parser does it, for example:
> > https://github.com/apache/calcite/blob/main/babel/src/main/codegen
> >
> > *An advice for lineage generation*
> > If you're going to create a lineage generator, I highly recommend 
> > using a relational tree (RelNode tree) instead of parse tree 
> > (SqlNode) if you have access to DB tables.
> > After all, a lineage shows the relation between tables/columns, and 
> > so does a relational tree. There is even a built-in method for lineage:
> > getExpressionLineage
> > <
> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/m
> > 

Re: Force push to calcite main

2023-07-07 Thread xiong duan
I would also be +1 for keeping it in place. I know several times it
happened to correct the commit information to make the commit info better
and format. Using  --force-with-lease( If new commits are added to the
remote branch (by another developer), this command would not update the
remote branch) can effectively avoid coverage. It is very helpful.

Julian Hyde  于2023年7月8日周六 03:38写道:

> I didn’t know about —force-with-lease. I hoped that it existed — because
> otherwise there is an obvious race condition — but never discovered it.
>
> Thank you for teaching me something, Michael.
>
> > On Jul 7, 2023, at 11:34 AM, Michael Mior  wrote:
> >
> > I would also be +1 for keeping it in place. It certainly doesn't solve
> all
> > the problems, but using --force-with-lease instead of --force is
> generally
> > a good idea. It just makes sure no one else has pushed anything since
> your
> > last update. If they have, it will block the force push. At that point,
> if
> > I still wanted to force push, I would rebase to incorporate their changes
> > if possible, and make sure I notify whoever pushed.
> >
> > --
> > Michael Mior
> > mm...@apache.org
> >
> >
> > On Fri, Jul 7, 2023 at 2:19 PM Julian Hyde 
> wrote:
> >
> >>> I prefer keeping force push in place
> >>
> >> +1. I use force push about a dozen times a year and hopefully no one
> >> notices. It helps the commit history clean.
> >>
> >> I know that force pushes can confuse CI systems and make things
> confusing
> >> to other users, so I try to be judicious. I trust other committers to do
> >> the same. (I suspect that this was a learning experience for Tanner.)
> >>
> >> Julian
> >>
> >>
> >>> On Jul 7, 2023, at 4:12 AM, Stamatis Zampetakis 
> >> wrote:
> >>>
> >>> @Tanner No worries we all did this at some point in time, thanks a lot
> >>> for following up!
> >>>
> >>> @Stanilovsky: I prefer keeping force push in place and avoiding messy
> >>> reverts that are usually necessary in various situations where we make
> >>> a mistake. All commits are archived so there is nothing that we can't
> >>> fix (I think).
> >>>
> >>> On Thu, Jul 6, 2023 at 3:45 PM Tanner Clary
> >>>  wrote:
> 
>  Hello,
> 
>  This was my mistake, my apologies. I will update the hashes. Sorry for
> >> any
>  inconvenience.
> 
>  Tanner
> 
>  On Thu, Jul 6, 2023 at 3:34 AM stanilovsky evgeny <
>  estanilovs...@gridgain.com> wrote:
> 
> > I already told that community need to vote for prohibit force push.
> >
> >> Hello,
> >>
> >> It appears that there was a force push to main yesterday [1]
> rewriting
> >> the history for a bunch of commits. I don't know if it was
> intentional
> >> or not but it seems that now resolved JIRAs (after CALCITE-5810 I
> >> think) are pointing to non-existent commits.
> >>
> >> Can someone please update the JIRA tickets with the correct commit
> >> hashes and also ensure that we didn't lose anything after the
> rebase?
> >>
> >> Best,
> >> Stamatis
> >>
> >> [1]
> https://lists.apache.org/thread/7jjnbkkh9tv49sjcc5kg2tm7c54tj861
> >
> >>
> >>
>
>


Re: Java Doc about RexProgramBuilderBase

2023-07-07 Thread xiong duan
Usually we need to improve the Java Doc or remove the misleading invalid
code. Because the right code has the unit test to cover it. Yes, This is a
bug and we should improve it. Feel free to submit PR for this.

Julian Hyde  于2023年7月8日周六 02:16写道:

> People refactoring code should remember that their IDE can move and rename
> fields but is not so good at changing documentation. (Maybe in a couple of
> years, with advances in generative AI?!)
>
> And when documentation and code don’t line up, people don’t trust either.
> (I’m as guilty of this as anyone.)
>
> > On Jul 7, 2023, at 5:31 AM, Benchao Li  wrote:
> >
> > This usually happens. When javadoc and the implementation diverges, the
> > javadoc is mostly possible wrong and need to be improved to match the
> real
> > behavior.
> >
> > For this specific case, I agree with you that the javadoc for
> > `vParamNotNull` and `vDecimal(int arg)` are not correct, please fix them.
> > (We can do this kind of trivial work without a Jira ticket)
> >
> > Zhe Hu  于2023年7月7日周五 14:45写道:
> >
> >> Hi community.
> >> Recently, when I review CALCITE-5769(
> >> https://github.com/apache/calcite/pull/3296), I found something a
> little
> >> confusing.
> >>
> >> First, the java doc in RexProgramBuilderBase.vParamNotNull(), which
> meant
> >> to create non-nullable variable, but it’s returning description is
> >> “nullable varchar variable”.
> >> Second, we use vDecimal(int arg) to create nullable decimal variable,
> but
> >> the RelDataType we pass in is “nonNullableDecimal”, which I think
> should be
> >> “nullableDecimal”. So does the other vXxx() methods.
> >> I’m not sure if I understand right here. If it’s something we can
> improve,
> >> I’ll file a JIRA case to record and fix it.
> >>
> >>
> >> Best regards,
> >> Zhe Hu
> >>
> >>
> >
> > --
> >
> > Best,
> > Benchao Li
>
>


Fwd: Data Engineering track for Community Over Code NA is calling for presentations

2023-07-07 Thread Julian Hyde
Calcite community members,

As the message below says, the CFP for Community Over Code North
America closes in a few days.

It would be great if there were talks about Calcite. Consider
submitting to the Data Engineering track (see below), the Big Data
track and the Geospatial track.

Also consider submitting a talk about your experiences in open source
- building community, growing your career and expertise, or solving
software engineering problems.

I plan to submit a talk, and I hope to see you there!

Julian

-- Forwarded message -
From: Jarek Potiuk 
Date: Fri, Jul 7, 2023 at 12:39 PM
Subject: Data Engineering track for Community Over Code NA is calling
for presentations
To: 
Cc: Ismaël Mejía 


Hello Hop community,

Just a reminder, that there are just 6 days left to submit your proposal
for The Community Over Code NA (former ApacheCon) conference.

This is the flagship event for the ASF in Halifax, Nova Scotia, Canada,
October 7-10, 2023 and together with Ismael, we want to encourage you to
submit talks for the Data Engineering track.

More info on the conference https://communityovercode.org/ and more info on
the Data Engineering track:
https://medium.com/@jarekpotiuk/data-engineering-community-over-code-conference-38e9677bb440

We hope to see you in Halifax in October!

J.


Re: Force push to calcite main

2023-07-07 Thread Julian Hyde
I didn’t know about —force-with-lease. I hoped that it existed — because 
otherwise there is an obvious race condition — but never discovered it. 

Thank you for teaching me something, Michael.

> On Jul 7, 2023, at 11:34 AM, Michael Mior  wrote:
> 
> I would also be +1 for keeping it in place. It certainly doesn't solve all
> the problems, but using --force-with-lease instead of --force is generally
> a good idea. It just makes sure no one else has pushed anything since your
> last update. If they have, it will block the force push. At that point, if
> I still wanted to force push, I would rebase to incorporate their changes
> if possible, and make sure I notify whoever pushed.
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> On Fri, Jul 7, 2023 at 2:19 PM Julian Hyde  wrote:
> 
>>> I prefer keeping force push in place
>> 
>> +1. I use force push about a dozen times a year and hopefully no one
>> notices. It helps the commit history clean.
>> 
>> I know that force pushes can confuse CI systems and make things confusing
>> to other users, so I try to be judicious. I trust other committers to do
>> the same. (I suspect that this was a learning experience for Tanner.)
>> 
>> Julian
>> 
>> 
>>> On Jul 7, 2023, at 4:12 AM, Stamatis Zampetakis 
>> wrote:
>>> 
>>> @Tanner No worries we all did this at some point in time, thanks a lot
>>> for following up!
>>> 
>>> @Stanilovsky: I prefer keeping force push in place and avoiding messy
>>> reverts that are usually necessary in various situations where we make
>>> a mistake. All commits are archived so there is nothing that we can't
>>> fix (I think).
>>> 
>>> On Thu, Jul 6, 2023 at 3:45 PM Tanner Clary
>>>  wrote:
 
 Hello,
 
 This was my mistake, my apologies. I will update the hashes. Sorry for
>> any
 inconvenience.
 
 Tanner
 
 On Thu, Jul 6, 2023 at 3:34 AM stanilovsky evgeny <
 estanilovs...@gridgain.com> wrote:
 
> I already told that community need to vote for prohibit force push.
> 
>> Hello,
>> 
>> It appears that there was a force push to main yesterday [1] rewriting
>> the history for a bunch of commits. I don't know if it was intentional
>> or not but it seems that now resolved JIRAs (after CALCITE-5810 I
>> think) are pointing to non-existent commits.
>> 
>> Can someone please update the JIRA tickets with the correct commit
>> hashes and also ensure that we didn't lose anything after the rebase?
>> 
>> Best,
>> Stamatis
>> 
>> [1] https://lists.apache.org/thread/7jjnbkkh9tv49sjcc5kg2tm7c54tj861
> 
>> 
>> 



Re: Force push to calcite main

2023-07-07 Thread Michael Mior
I would also be +1 for keeping it in place. It certainly doesn't solve all
the problems, but using --force-with-lease instead of --force is generally
a good idea. It just makes sure no one else has pushed anything since your
last update. If they have, it will block the force push. At that point, if
I still wanted to force push, I would rebase to incorporate their changes
if possible, and make sure I notify whoever pushed.

--
Michael Mior
mm...@apache.org


On Fri, Jul 7, 2023 at 2:19 PM Julian Hyde  wrote:

> > I prefer keeping force push in place
>
> +1. I use force push about a dozen times a year and hopefully no one
> notices. It helps the commit history clean.
>
> I know that force pushes can confuse CI systems and make things confusing
> to other users, so I try to be judicious. I trust other committers to do
> the same. (I suspect that this was a learning experience for Tanner.)
>
> Julian
>
>
> > On Jul 7, 2023, at 4:12 AM, Stamatis Zampetakis 
> wrote:
> >
> > @Tanner No worries we all did this at some point in time, thanks a lot
> > for following up!
> >
> > @Stanilovsky: I prefer keeping force push in place and avoiding messy
> > reverts that are usually necessary in various situations where we make
> > a mistake. All commits are archived so there is nothing that we can't
> > fix (I think).
> >
> > On Thu, Jul 6, 2023 at 3:45 PM Tanner Clary
> >  wrote:
> >>
> >> Hello,
> >>
> >> This was my mistake, my apologies. I will update the hashes. Sorry for
> any
> >> inconvenience.
> >>
> >> Tanner
> >>
> >> On Thu, Jul 6, 2023 at 3:34 AM stanilovsky evgeny <
> >> estanilovs...@gridgain.com> wrote:
> >>
> >>> I already told that community need to vote for prohibit force push.
> >>>
>  Hello,
> 
>  It appears that there was a force push to main yesterday [1] rewriting
>  the history for a bunch of commits. I don't know if it was intentional
>  or not but it seems that now resolved JIRAs (after CALCITE-5810 I
>  think) are pointing to non-existent commits.
> 
>  Can someone please update the JIRA tickets with the correct commit
>  hashes and also ensure that we didn't lose anything after the rebase?
> 
>  Best,
>  Stamatis
> 
>  [1] https://lists.apache.org/thread/7jjnbkkh9tv49sjcc5kg2tm7c54tj861
> >>>
>
>


RE: Re: Calcite for Lineage

2023-07-07 Thread Nathaniel Vala

Also, the YEAR expression is not the only one I am having issues with, for 
example there is also CONVERT, “Quoted Column Names” etc. It feels like I must 
have missed a step somewhere or need to define something in the Parser.

Worst case, how would I be able to extend some of these functions?

Kind Regards,
Nathaniel Vala


On 2023/06/23 17:12:19 Askar Bozcan wrote:
> Oh, true. It can parse CREATE TABLE however.
> CREATE TABLE is pretty much the only DDL I needed to parse, so I didn't
> really notice this.
> Thanks for the correction!
>
> - Askar
>
>
>
>
> On Fri, 23 Jun 2023 at 19:43, mailto:mb...@gmail.com>> wrote:
>
> > From my experience the Babel parser does NOT include DDL - at least not in
> > the released versions.
> >
> > Mihai
> >
> > -Original Message-
> > From: Askar Bozcan
> > Sent: Friday, June 23, 2023 1:14 AM
> > To: dev@calcite.apache.org
> > Subject: Re: Calcite for Lineage
> >
> > Hey Nathaniel,
> > To parse DDL statements, you need to use a different parser (see
> > SqlParser.Config.withParserFactory
> > <
> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/parser/SqlParser.Config.html#withParserFactory(org.apache.calcite.sql.parser.SqlParserImplFactory)
> > >)
> > because core Calcite parser
> > <
> > https://github.com/apache/calcite/blob/main/core/src/main/codegen/templates/Parser.jj
> > >
> > does
> > not support DDL statements by design. You have two options:
> > 1) Use SqlDdlParserImpl.FACTORY (from calcite-server package, as you've
> > said)
> > This is an extended parser that can also parse DDL queries.
> >
> > 2)  Use SqlBabelParserImpl.FACTORY (from calcite-babel package) This is
> > also an extended parser that can not only parse DDL queries, but also many
> > extra things not present in ISO SQL standard, such as Postgre's infix CAST
> > operator *::* Since you're doing lineage generation I highly recommend
> > using babel package for maximum compatibility with different DBs' queries.
> >
> > *Mini-explanation on how parsing works in Calcite* Try Go To'ing to
> > SqlParserImpl code in your IDE. You're going to see a huge file, full of
> > almost nonsensical if's.
> > The reason is simple: Parser code is build-time generated Java code based
> > on rules defined in core Calcite parser <
> > https://github.com/apache/calcite/blob/main/core/src/main/codegen/templates/Parser.jj
> > >.
> > Parser.jj is a JavaCC file; JavaCC 
> >  is
> > a parser generator that, based on rules you define, generates a pure Java
> > code which can parse LL(k) grammars and generate the parse tree (SqlNode).
> >
> > Parser.jj, as seen on the repo, is not a pure JavaCC file however, but an
> > Apache FreeMarker template. There are strings in the Parser.jj file that
> > start with *${ .* Those are placeholders used by Apache FreeMarker, a
> > templating engine.
> > Extended parsers (babel, ddlparser) use those placeholders to insert their
> > custom parsing rules without directly affecting the core parser file (it
> > all still happens build-time, however).
> > How Babel parser does it, for example:
> > https://github.com/apache/calcite/blob/main/babel/src/main/codegen
> >
> > *An advice for lineage generation*
> > If you're going to create a lineage generator, I highly recommend using a
> > relational tree (RelNode tree) instead of parse tree (SqlNode) if you have
> > access to DB tables.
> > After all, a lineage shows the relation between tables/columns, and so
> > does a relational tree. There is even a built-in method for lineage:
> > getExpressionLineage
> > <
> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/metadata/RelMetadataQuery.html#getExpressionLineage(org.apache.calcite.rel.RelNode,org.apache.calcite.rex.RexNode)
> > >
> >
> > Another advice, if you're planning to support different kinds of DBs, you
> > will eventually run into something unparseable, and that will require the
> > extension of the core Parser/babel Parser. Since it's build time, I suggest
> > you submitting a PR to extend the parsers.
> >
> > Good luck,
> > Askar
> >
> >
> >
> > On 23 Jun 2023 Fri at 07:37 Nathaniel Vala 
> > mailto:na...@hotmail.com>>
> > wrote:
> >
> > > Hi All,
> > >
> > > I have been trying to build a java tool that would let people map
> > > lineage by reading sql scripts (i.e. views or insert into etc.) Im
> > > having a little trouble with a couple of things and was hoping for some
> > pointers.
> > >
> > > Firstly, I cant seem to parse any DDL statements (so trouble with
> > > `CREATE VIEW AS [SQL QUERY]`).I understand this is meant to be in the
> > > calcite-server module but cant really find anything).
> > > I decided to ignore the CREATE statements for the moment and just
> > > process the query to get the sources in it which was working on simple
> > > scripts but fail real quick when looking at things I've seen at
> > enterprises.
> > >
> > > I 

RE: Re: Calcite for Lineage

2023-07-07 Thread Nathaniel Vala
Hi Askar and Mihai,

Thanks for the help. I have modified the code to use a DDL parser, Minhai was 
right about the Babel one not having all the DDL statements in it. I have 
managed to parse this but still falling over at token that should parse.

When parsing queries using `SqlDdlParserImpl.FACTORY` and `Lex.SQL_SERVER` I 
try to parse a simple DATEADD function and it’s failing.

```sql
SELECT  DATEADD(year, 1, '2017/08/25') AS DateAdd
FROM unienrollements t1
```

Java throws an `InvocationTargetException` near ‘YEAR’ and that it was 
expecting 
However in that list, it has YEAR and YEAR (. To me, this should be able to 
parse, have I missed something in the config to enable this?

If you need some code, this can be found here:
https://github.com/Spydernaz/sqlLineage


Furthermore, thanks for the suggestion regarding the RelMetadata but for the 
life of me cannot config it. Seems like I need to have a full connection 
established and fetch live schemas for it to validate against? This isn’t 
really an option for me at the moment. I will upload the RelMetadata attempt in 
a separate branch of this repo

On 2023/06/23 17:12:19 Askar Bozcan wrote:
> Oh, true. It can parse CREATE TABLE however.
> CREATE TABLE is pretty much the only DDL I needed to parse, so I didn't
> really notice this.
> Thanks for the correction!
>
> - Askar
>
>
>
>
> On Fri, 23 Jun 2023 at 19:43, mailto:mb...@gmail.com>> wrote:
>
> > From my experience the Babel parser does NOT include DDL - at least not in
> > the released versions.
> >
> > Mihai
> >
> > -Original Message-
> > From: Askar Bozcan
> > Sent: Friday, June 23, 2023 1:14 AM
> > To: dev@calcite.apache.org
> > Subject: Re: Calcite for Lineage
> >
> > Hey Nathaniel,
> > To parse DDL statements, you need to use a different parser (see
> > SqlParser.Config.withParserFactory
> > <
> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/parser/SqlParser.Config.html#withParserFactory(org.apache.calcite.sql.parser.SqlParserImplFactory)
> > >)
> > because core Calcite parser
> > <
> > https://github.com/apache/calcite/blob/main/core/src/main/codegen/templates/Parser.jj
> > >
> > does
> > not support DDL statements by design. You have two options:
> > 1) Use SqlDdlParserImpl.FACTORY (from calcite-server package, as you've
> > said)
> > This is an extended parser that can also parse DDL queries.
> >
> > 2)  Use SqlBabelParserImpl.FACTORY (from calcite-babel package) This is
> > also an extended parser that can not only parse DDL queries, but also many
> > extra things not present in ISO SQL standard, such as Postgre's infix CAST
> > operator *::* Since you're doing lineage generation I highly recommend
> > using babel package for maximum compatibility with different DBs' queries.
> >
> > *Mini-explanation on how parsing works in Calcite* Try Go To'ing to
> > SqlParserImpl code in your IDE. You're going to see a huge file, full of
> > almost nonsensical if's.
> > The reason is simple: Parser code is build-time generated Java code based
> > on rules defined in core Calcite parser <
> > https://github.com/apache/calcite/blob/main/core/src/main/codegen/templates/Parser.jj
> > >.
> > Parser.jj is a JavaCC file; JavaCC 
> >  is
> > a parser generator that, based on rules you define, generates a pure Java
> > code which can parse LL(k) grammars and generate the parse tree (SqlNode).
> >
> > Parser.jj, as seen on the repo, is not a pure JavaCC file however, but an
> > Apache FreeMarker template. There are strings in the Parser.jj file that
> > start with *${ .* Those are placeholders used by Apache FreeMarker, a
> > templating engine.
> > Extended parsers (babel, ddlparser) use those placeholders to insert their
> > custom parsing rules without directly affecting the core parser file (it
> > all still happens build-time, however).
> > How Babel parser does it, for example:
> > https://github.com/apache/calcite/blob/main/babel/src/main/codegen
> >
> > *An advice for lineage generation*
> > If you're going to create a lineage generator, I highly recommend using a
> > relational tree (RelNode tree) instead of parse tree (SqlNode) if you have
> > access to DB tables.
> > After all, a lineage shows the relation between tables/columns, and so
> > does a relational tree. There is even a built-in method for lineage:
> > getExpressionLineage
> > <
> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/metadata/RelMetadataQuery.html#getExpressionLineage(org.apache.calcite.rel.RelNode,org.apache.calcite.rex.RexNode)
> > >
> >
> > Another advice, if you're planning to support different kinds of DBs, you
> > will eventually run into something unparseable, and that will require the
> > extension of the core Parser/babel Parser. Since it's build time, I suggest
> > you submitting a PR to extend the parsers.
> >
> > Good luck,
> > Askar
> >
> >
> >
> > On 23 Jun 2023 Fri 

Re: Force push to calcite main

2023-07-07 Thread Julian Hyde
> I prefer keeping force push in place

+1. I use force push about a dozen times a year and hopefully no one notices. 
It helps the commit history clean.

I know that force pushes can confuse CI systems and make things confusing to 
other users, so I try to be judicious. I trust other committers to do the same. 
(I suspect that this was a learning experience for Tanner.)

Julian


> On Jul 7, 2023, at 4:12 AM, Stamatis Zampetakis  wrote:
> 
> @Tanner No worries we all did this at some point in time, thanks a lot
> for following up!
> 
> @Stanilovsky: I prefer keeping force push in place and avoiding messy
> reverts that are usually necessary in various situations where we make
> a mistake. All commits are archived so there is nothing that we can't
> fix (I think).
> 
> On Thu, Jul 6, 2023 at 3:45 PM Tanner Clary
>  wrote:
>> 
>> Hello,
>> 
>> This was my mistake, my apologies. I will update the hashes. Sorry for any
>> inconvenience.
>> 
>> Tanner
>> 
>> On Thu, Jul 6, 2023 at 3:34 AM stanilovsky evgeny <
>> estanilovs...@gridgain.com> wrote:
>> 
>>> I already told that community need to vote for prohibit force push.
>>> 
 Hello,
 
 It appears that there was a force push to main yesterday [1] rewriting
 the history for a bunch of commits. I don't know if it was intentional
 or not but it seems that now resolved JIRAs (after CALCITE-5810 I
 think) are pointing to non-existent commits.
 
 Can someone please update the JIRA tickets with the correct commit
 hashes and also ensure that we didn't lose anything after the rebase?
 
 Best,
 Stamatis
 
 [1] https://lists.apache.org/thread/7jjnbkkh9tv49sjcc5kg2tm7c54tj861
>>> 



Re: Java Doc about RexProgramBuilderBase

2023-07-07 Thread Julian Hyde
People refactoring code should remember that their IDE can move and rename 
fields but is not so good at changing documentation. (Maybe in a couple of 
years, with advances in generative AI?!)

And when documentation and code don’t line up, people don’t trust either. (I’m 
as guilty of this as anyone.)

> On Jul 7, 2023, at 5:31 AM, Benchao Li  wrote:
> 
> This usually happens. When javadoc and the implementation diverges, the
> javadoc is mostly possible wrong and need to be improved to match the real
> behavior.
> 
> For this specific case, I agree with you that the javadoc for
> `vParamNotNull` and `vDecimal(int arg)` are not correct, please fix them.
> (We can do this kind of trivial work without a Jira ticket)
> 
> Zhe Hu  于2023年7月7日周五 14:45写道:
> 
>> Hi community.
>> Recently, when I review CALCITE-5769(
>> https://github.com/apache/calcite/pull/3296), I found something a little
>> confusing.
>> 
>> First, the java doc in RexProgramBuilderBase.vParamNotNull(), which meant
>> to create non-nullable variable, but it’s returning description is
>> “nullable varchar variable”.
>> Second, we use vDecimal(int arg) to create nullable decimal variable, but
>> the RelDataType we pass in is “nonNullableDecimal”, which I think should be
>> “nullableDecimal”. So does the other vXxx() methods.
>> I’m not sure if I understand right here. If it’s something we can improve,
>> I’ll file a JIRA case to record and fix it.
>> 
>> 
>> Best regards,
>> Zhe Hu
>> 
>> 
> 
> -- 
> 
> Best,
> Benchao Li



Caching of schema from JDBC connection

2023-07-07 Thread Kirz, Thomas
Hi,

I am using Calcite parse retrieve the result and parameter types of 
PreparedStatements.
To do this, I create a JDBCSchema connected to a (SQLite or Postgres) data 
source (once) and then parse and convert to RelNode each statement string.
I wanted to find out if Calcite stores/caches the schema locally after 
converting the first SQL statement, so it does not have to connect to the 
database when converting subsequent statements.
This seems to be the case as it still works after shutting down the database 
after converting the first statement.
But while inspecting what happens within the framework's source code, I found 
JdbcSchema#getRelDataType which seems to indicate that a connection to the 
database is needed every time.

Am I perhaps misunderstanding how the schema is accessed?

Thank you for your help!

Thomas Kirz


Jenkins build is back to normal : Calcite » Calcite-snapshots #522

2023-07-07 Thread Apache Jenkins Server
See 




Re: redis-server.app request network problem during calcite build

2023-07-07 Thread Benchao Li
Hi Ran,

There is no installation of Redis while running the tests in Redis Adapter,
the test will start an embedded Redis server or docker container depends on
whether there is a docker environment[1].

[1]
https://github.com/apache/calcite/blob/f38b2a89b46d494b76eb318d1be54b6cb3aab59f/redis/src/test/java/org/apache/calcite/adapter/redis/RedisCaseBase.java#L72C5-L82

Ran Tao  于2023年7月7日周五 22:10写道:

> Hi, community.
>
> Every time my personal mac computer executes calcite build, *
> redis-server.app
> * will pop up to request the network (redis
> adapter?) I know this is a firewall policy issue, and I hope to
> automatically add it to the firewall policy to avoid manual confirmation
> every time , but the problem is that I did not find *redis-server.app
> * after scanning the hard disk. Is this
> automatically installed during calcite build? If yes, where is the
> directory?
>
>
> Best Regards,
> Ran Tao
> https://github.com/chucheng92
>


-- 

Best,
Benchao Li


redis-server.app request network problem during calcite build

2023-07-07 Thread Ran Tao
Hi, community.

Every time my personal mac computer executes calcite build, *redis-server.app
* will pop up to request the network (redis
adapter?) I know this is a firewall policy issue, and I hope to
automatically add it to the firewall policy to avoid manual confirmation
every time , but the problem is that I did not find *redis-server.app
* after scanning the hard disk. Is this
automatically installed during calcite build? If yes, where is the
directory?


Best Regards,
Ran Tao
https://github.com/chucheng92


[jira] [Created] (CALCITE-5822) Add BIT_LENGTH for spark dialect

2023-07-07 Thread Ran Tao (Jira)
Ran Tao created CALCITE-5822:


 Summary: Add BIT_LENGTH for spark dialect
 Key: CALCITE-5822
 URL: https://issues.apache.org/jira/browse/CALCITE-5822
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.35.0
Reporter: Ran Tao
Assignee: Ran Tao


Returns the string or binary data bit numbers.


> SELECT bit_length('Apache Calcite');
72 
> SELECT bit_length(x'537061726b2053514c'); 
72

 

https://spark.apache.org/docs/latest/api/sql/index.html#bit_length



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5821) Add FORMAT_NUMBER function (enabled in Hive and Spark library)

2023-07-07 Thread Runkang He (Jira)
Runkang He created CALCITE-5821:
---

 Summary: Add FORMAT_NUMBER function (enabled in Hive and Spark 
library)
 Key: CALCITE-5821
 URL: https://issues.apache.org/jira/browse/CALCITE-5821
 Project: Calcite
  Issue Type: New Feature
  Components: core
Affects Versions: 1.34.0
Reporter: Runkang He


Add FORMAT_NUMBER function (enabled in Hive and Spark library):

FORMAT_NUMBER(expr1, expr2) - Formats the number {{expr1}} like '#,###,###.##', 
rounded to {{expr2}} decimal places. If {{expr2}} is 0, the result has no 
decimal point or fractional part. {{expr2}} also accept a user specified 
format. This is supposed to function like MySQL's FORMAT.

See more details in 
[Hive|https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#LanguageManualUDF-StringFunctions]
 and 
[Spark|https://spark.apache.org/docs/latest/api/sql/index.html#format_number] 
doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5820) Add PARSE_URL function (enabled in Hive and Spark library)

2023-07-07 Thread Runkang He (Jira)
Runkang He created CALCITE-5820:
---

 Summary: Add PARSE_URL function (enabled in Hive and Spark library)
 Key: CALCITE-5820
 URL: https://issues.apache.org/jira/browse/CALCITE-5820
 Project: Calcite
  Issue Type: New Feature
  Components: core
Affects Versions: 1.34.0
Reporter: Runkang He


Add PARSE_URL (enabled in Hive and Spark library):

PARSE_URL: Returns the specified part from the URL. Valid values for 
partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and 
USERINFO.

For example, parse_url('http://facebook.com/path1/p.php?k1=v1=v2#Ref1', 
'HOST') returns 'facebook.com'.

See more details in 
[Hive|https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#LanguageManualUDF-StringFunctions]
 and [Spark|https://spark.apache.org/docs/latest/api/sql/index.html#parse_url] 
doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Java Doc about RexProgramBuilderBase

2023-07-07 Thread Benchao Li
This usually happens. When javadoc and the implementation diverges, the
javadoc is mostly possible wrong and need to be improved to match the real
behavior.

For this specific case, I agree with you that the javadoc for
`vParamNotNull` and `vDecimal(int arg)` are not correct, please fix them.
(We can do this kind of trivial work without a Jira ticket)

Zhe Hu  于2023年7月7日周五 14:45写道:

> Hi community.
> Recently, when I review CALCITE-5769(
> https://github.com/apache/calcite/pull/3296), I found something a little
> confusing.
>
> First, the java doc in RexProgramBuilderBase.vParamNotNull(), which meant
> to create non-nullable variable, but it’s returning description is
> “nullable varchar variable”.
> Second, we use vDecimal(int arg) to create nullable decimal variable, but
> the RelDataType we pass in is “nonNullableDecimal”, which I think should be
> “nullableDecimal”. So does the other vXxx() methods.
> I’m not sure if I understand right here. If it’s something we can improve,
> I’ll file a JIRA case to record and fix it.
>
>
> Best regards,
> Zhe Hu
>
>

-- 

Best,
Benchao Li


Re: Force push to calcite main

2023-07-07 Thread Stamatis Zampetakis
@Tanner No worries we all did this at some point in time, thanks a lot
for following up!

@Stanilovsky: I prefer keeping force push in place and avoiding messy
reverts that are usually necessary in various situations where we make
a mistake. All commits are archived so there is nothing that we can't
fix (I think).

On Thu, Jul 6, 2023 at 3:45 PM Tanner Clary
 wrote:
>
> Hello,
>
> This was my mistake, my apologies. I will update the hashes. Sorry for any
> inconvenience.
>
> Tanner
>
> On Thu, Jul 6, 2023 at 3:34 AM stanilovsky evgeny <
> estanilovs...@gridgain.com> wrote:
>
> > I already told that community need to vote for prohibit force push.
> >
> > > Hello,
> > >
> > > It appears that there was a force push to main yesterday [1] rewriting
> > > the history for a bunch of commits. I don't know if it was intentional
> > > or not but it seems that now resolved JIRAs (after CALCITE-5810 I
> > > think) are pointing to non-existent commits.
> > >
> > > Can someone please update the JIRA tickets with the correct commit
> > > hashes and also ensure that we didn't lose anything after the rebase?
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1] https://lists.apache.org/thread/7jjnbkkh9tv49sjcc5kg2tm7c54tj861
> >


Java Doc about RexProgramBuilderBase

2023-07-07 Thread Zhe Hu
Hi community.
Recently, when I review 
CALCITE-5769(https://github.com/apache/calcite/pull/3296), I found something a 
little
confusing.

First, the java doc in RexProgramBuilderBase.vParamNotNull(), which meant to 
create non-nullable variable, but it’s returning description is “nullable 
varchar variable”.
Second, we use vDecimal(int arg) to create nullable decimal variable, but the 
RelDataType we pass in is “nonNullableDecimal”, which I think should be 
“nullableDecimal”. So does the other vXxx() methods.
I’m not sure if I understand right here. If it’s something we can improve, I’ll 
file a JIRA case to record and fix it.


Best regards,
Zhe Hu



Re: Optimal way to organize Joins in Calcite

2023-07-07 Thread Roman Kondakov

Hi Jonathan,

if you are using custom RelOptCost with custom cost model, you can apply 
your cost model to the logical nodes as well.


1. You need to initialize the planner with your implementation of the 
RelOptCostFactory


2. You can also override the default cost formula for any node (logical 
or physical) using this metadata handler [1]


[1] 
https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/rel/metadata/RelMdPercentageOriginalRows.java#L186


Thanks.

Roman.

On 07.07.2023 02:46, Jonathan Sternberg wrote:

Thanks.

For MultiJoin, I'm trying to get it to work with a custom cost model
and custom output convention. We have our costs for operations as part of
the implementation of the physical nodes. Since MultiJoin uses the costs to
determine the join ordering, I'm a bit concerned that it is using the costs
from the logical plans rather than our custom ones. Is it possible to
utilize the physical nodes with MultiJoin or does it have to be utilized
with logical nodes only?

--Jonathan Sternberg

On Tue, Jul 4, 2023 at 1:43 AM Roman Kondakov 
wrote:


Hi Jonathan,

1. As Julian mentioned, it's better to use heuristic join order for
large amount of joins

2. LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule AFAIK always
produce tree of joins, not a MultiJoin.

3. Yes, your understanding is correct. You can check the default join
order program [1]

[1]

https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/tools/Programs.java#L186

Thanks,

Roman.

On 03.07.2023 22:48, Julian Hyde wrote:

The reason that there are two strategies is because of large joins. If

your query joins 10 tables, the number of possible join orders is large
(bounded by 10 factorial I believe) and therefore would overwhelm the
Volcano planner, which must construct each possibility.

Therefore we have a heuristic algorithm that you should use for large

joins. We gather the entire FROM clause into a data structure called
MultiJoin, and a single rule call applies heuristics and spits out a join
order that is probably close to optimal.

When you are optimizing a query, you need to know whether you are in

danger of being swallowed by the monster that is the complexity of large
joins. If your query only joins 2 or 3 tables (and in some other situations
too) you are not in danger and can safely exhaustively enumerate plans.

On Jul 3, 2023, at 7:58 AM, Jonathan Sternberg 

wrote:

Hi,

I'm presently working on optimizing the ordering of joins for queries

and

had a few questions about the optimal way to do that with Calcite.

I watched this meetup video (

https://www.youtube.com/watch?v=5wQojihyJDs)

and spent some time experimenting with JoinAssociateRule,

JoinCommuteRule,

and the rules related to MultiJoins. We're utilizing the volcano planner
for optimization at the present moment but also have the freedom to
customize the order and phases for the planner phases.

1. Is MultiJoin generally suggested over JoinAssociate and JoinCommute
rules? Or are JoinAssociate and JoinCommute still recommended as the
standard way to handle reordering of joins?
2. Our system only supports performing the join over two inputs and we
can't support MultiJoin as a physical operation. My understanding is

that

the LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule will rearrange

the

join but will still produce a MultiJoin. What's the appropriate way to
convert a MultiJoin back to a set of joins?
3. My understanding is that MultiJoin rules aren't compatible with the
volcano planner and should be run as part of a stage using the heuristic
planner. Is this understanding correct?

Thank you for any help.

--Jonathan Sternberg