Refactor reference.md

2024-01-24 Thread Cancai Cai
Hey Calcite Devs,

I am currently working on CALCITE-6215
<https://issues.apache.org/jira/browse/CALCITE-6215>. During my work, I
have noticed that certain functions have multiple variations with different
parameter types in their respective databases. For example, in PostgreSQL,
the to_char function supports multiple forms such as to_char(timestamp,
text), to_char(interval, text), and to_char(numeric_type, text).

However, the description in Calcite is not clear enough. For instance, the
reference.md document describes the to_char function as follows:

| m o p | TO_CHAR(timestamp, format) | Converts *timestamp* to a string
using the format *format*.

This description may not provide enough clarity for users to understand the
usage of each function across different databases.

I suggest considering adding specific links to the corresponding database
functions in the reference.md document to enhance its completeness. This
would allow users to easily access the documentation for the respective
database functions.

Thanks as always,

Cancai Cai

https://www.postgresql.org/docs/16/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME-TABLE


Re: Refactor reference.md

2024-01-25 Thread Cancai Cai
Can we add the corresponding link to the keyword representing the data source 
like m o p, for example: 
p<https://www.postgresql.org/docs/16/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME
 -TABLE>
But selecting the data source version also seems to be a problem

Thanks as always,

Cancai Cai

On 2024/01/24 22:07:23 Julian Hyde wrote:
> The current review process is that I hand-edit most of the submissions. (Or 
> make detailed suggestions: ‘Convert this verb from declarative to imperative. 
> Remove the space before the open parenthesis.') I we increase the scope of 
> the documentation, that puts more burden on me.
> 
> We should not increase the scope of the documentation unless we introduce 
> generation. Generation will at least allow us to automate some checks. Such 
> as that the examples actually parse, and return the results the doc says they 
> do.
> 
> Julian
> 
> 
> > On Jan 24, 2024, at 1:55 PM, Mihai Budiu  wrote:
> > 
> > I am not proposing a new process, the existing review process would 
> > continue to apply. The page would still be part of the repository. Just a 
> > separate web page on the calcite site, unbundled from the SQL language page.
> > 
> > Mihai
> > 
> > 
> > From: Julian Hyde 
> > Sent: Wednesday, January 24, 2024 1:27 PM
> > To: dev@calcite.apache.org 
> > Subject: Re: Refactor reference.md
> > 
> > "The documentation would be incrementally improved, like the code base." Or 
> > it might incrementally decline into a shambles. Sure, this is open source, 
> > and open source can sometimes create miracles, but we need to be realistic. 
> > We need an owner, and systems in place, to overcome the effects of entropy.
> > 
> > Other products have a separate page for each function, and an index 
> > containing all functions. For example, see PostGIS: 
> > https://postgis.net/docs/manual-1.5/ST_MakeLine.html. But take a look at 
> > the meta tags at the top of the page - it’s generated from DocBook. That is 
> > a tell that it is maintained by a professional writer.
> > 
> > Julian
> > 
> > 
> > 
> >> On Jan 24, 2024, at 1:12 PM, Mihai Budiu  wrote:
> >> 
> >> The documentation would be incrementally improved, like the codebase.
> >> We could start by just moving it to a different file. The narrow table 
> >> also makes it difficult to read, so perhaps we can reformat that. A third 
> >> column would be nice for examples, but it would mostly be empty initially. 
> >> Some functions require additional clarifications as long text, maybe these 
> >> can be footnotes?
> >> 
> >> Another solution is to make a separate table for each class of functions: 
> >> string, numeric, array, etc. That would make it easier to intersperse with 
> >> additional notes.
> >> 
> >> The terse format makes it very difficult to explain things that are 
> >> subtle. For example, see my PR https://github.com/apache/calcite/pull/3571 
> >> which only attempts to clarify something, but has not been approved since 
> >> early December.
> >> 
> >> Mihai
> >> 
> >> From: Julian Hyde 
> >> Sent: Wednesday, January 24, 2024 12:54 PM
> >> To: dev@calcite.apache.org 
> >> Subject: Re: Refactor reference.md
> >> 
> >> Extra documentation would be nice. But who is going to write (and 
> >> maintain) this extra documentation?
> >> 
> >> Even the current documentation takes a lot of work. When reviewing a PR to 
> >> add a function, I have to tell people to remove a ‘.’ at the end of the 
> >> line to be consistent with the existing doc. Without those efforts, the 
> >> documentation would be a shambles, and no one would trust it. We have over 
> >> 500 functions.
> >> 
> >> Julian
> >> 
> >> 
> >>> On Jan 24, 2024, at 9:46 AM, Mihai Budiu  wrote:
> >>> 
> >>> I think we should make a separate document for the functions, and in 
> >>> general give more details about the functions' behavior. The current 
> >>> model is to give a very brief description of the function, but that's 
> >>> often not enough, users have to resort to either experiments or to 
> >>> reading documentation from other databases. The behavior should be 
> >>> described for corner cases, and ideally there should be examples as well.
> >>> 
> >>> Mihai
> >>> 

Re: About SqlFunctions.java slimming in calcite

2024-01-30 Thread Cancai Cai
Hi Forward Xu,
I think this is a good idea. This can not only improve the readability and
maintainability of the code, but it may also be helpful for new developers
to learn. However, how to migrate safely is a big problem, because we
cannot influence User upgrade Maybe we should start with a String Function

Best,
Cancai Cai

Forward Xu  于2024年1月30日周二 14:06写道:

> hi Tanner,
> Thank you very much for your reply.
> In the ISO/IEC SQL standard, scalar functions can be classified into the
> following types:
> 1. String functions: Functions for manipulating string data, such as
> CONCAT(), SUBSTRING().
> 2. Numeric functions: Functions for working with numeric data, such as
> ABS(), ROUND().
> 3. Date and time functions: Functions for manipulating date and time data,
> such as DATEADD(), DATEDIFF().
> 4. System functions: Functions providing information about the database
> environment and other system-related details, such as CURRENT_DATE,
> CURRENT_USER.
>
> Best,
> ForwardXu
>
> Tanner Clary  于2024年1月30日周二 12:21写道:
>
> > I don't think it's a bad idea but I'm also not sure whether it's worth
> the
> > time and effort of deciding what functions should be grouped together,
> etc.
> > Seeing as how methods in that class are mostly (always?) called via
> > reflection, I think having them in one place is helpful in ensuring
> > consistent code style/patterns. Test classes in Calcite, for instance,
> vary
> > in thoroughness, documentation, formatting, etc, but individual tests in
> > the same class rarely vary from one another. That's just from a
> > maintenance/consistency perspective, if there's some performance issues
> or
> > otherwise that I'm neglecting then maybe it is worth the attention.
> >
> > Tanner
> >
> > On Mon, Jan 29, 2024 at 8:02 PM Forward Xu 
> wrote:
> >
> > > hi,
> > > I noticed that the code file of the SqlFunctions.java class is getting
> > > larger and larger and the functions are increasing. I think some
> > functions
> > > can be broken down into different function classes to slim down
> > > SqlFunctions. For example: json functions use JsonFunctions, and only
> > > SqlFunctions retain Function name reference.
> > >
> > > What do you think?
> > >
> > > Best,
> > > ForwardXu
> > >
> >
>


Thank you everyone

2024-02-05 Thread Cancai Cai
I have caused a lot of trouble to everyone since I joined the calcite
community, whether it is raising Jira specifications or making low-level
mistakes in PRs. These are things I should avoid, and I am sorry for this.
Thank you very much to all the committers and pmc for your guidance. I have
learned a lot and know that I have many shortcomings. I will improve them.
Thank you again.

Best wishes,
Cancai Cai


About Negative testing for adapting to different databases

2024-02-06 Thread Cancai Cai
Hello Calcite Community,
I'm currently completing CALCITE-6224
<https://issues.apache.org/jira/browse/CALCITE-6224>.
During the completion process, we found that the return result of log2(0)
in calcite is different from that in mysql and spark. The return result of
log2(0) in calcite is -Infinity. But mysql and spark return NULL. We are
considering whether we need to adapt the return results of log2(0) in mysql
and spark. But a new problem comes out, if I adapt log2(0). Then do I also
need to consider the negative test of the two functions log10 and log in
other databases?
I believe this happens to me not only in the log function, I observed the
same situation in CALCITE-6241
<https://issues.apache.org/jira/browse/CALCITE-6241>.
Now my doubt is whether we need to adapt the negative test of each function
in each database (because there are examples of negative test results
caused by different databases and different versions) .
I think we can add a negative test in calcite to tell users and developers
how calcite handles the negatvie test of this kind of function, but I am
not sure whether it needs to be adapted.

Best wishes,
Cancai Cai


Re: About Negative testing for adapting to different databases

2024-02-08 Thread Cancai Cai
Hi,
Thank your for yout reply

Best wishes,
Cancai Cai

Julian Hyde  于2024年2月9日周五 05:17写道:

> Functions have slightly different behaviors on different databases.
> Solving the problem starts with logging a jira case, and having a
> discussion about the specification.
>
> Rather than say (for example) ’SOUNDEX returns the wrong result’, it’s
> helpful to say ’SOUNDEX result for empty string is inconsistent with
> MySQL’. Then we can discover that SOUNDEX has different behaviors on MySQL
> and Postgres, decide whether we want to support both behaviors, and if so,
> decide how we will achieve that behavior.
>
> As I have said many times: log a jira first, and only then get to work on
> a PR.
>
> Julian
>
>
> > On Feb 6, 2024, at 8:38 AM, Mihai Budiu  wrote:
> >
> > These are two different functions with the same name.
> > You need two different implementations.
> > There are other similar functions in Calcite, e.g., SOUNDEX.
> >
> https://calcite.apache.org/docs/reference.html#dialect-specific-operators
> >
> > Mihai
> > 
> > From: Cancai Cai 
> > Sent: Tuesday, February 6, 2024 8:26 AM
> > To: dev@calcite.apache.org 
> > Subject: About Negative testing for adapting to different databases
> >
> > Hello Calcite Community,
> > I'm currently completing CALCITE-6224
> > <https://issues.apache.org/jira/browse/CALCITE-6224>.
> > During the completion process, we found that the return result of log2(0)
> > in calcite is different from that in mysql and spark. The return result
> of
> > log2(0) in calcite is -Infinity. But mysql and spark return NULL. We are
> > considering whether we need to adapt the return results of log2(0) in
> mysql
> > and spark. But a new problem comes out, if I adapt log2(0). Then do I
> also
> > need to consider the negative test of the two functions log10 and log in
> > other databases?
> > I believe this happens to me not only in the log function, I observed the
> > same situation in CALCITE-6241
> > <https://issues.apache.org/jira/browse/CALCITE-6241>.
> > Now my doubt is whether we need to adapt the negative test of each
> function
> > in each database (because there are examples of negative test results
> > caused by different databases and different versions) .
> > I think we can add a negative test in calcite to tell users and
> developers
> > how calcite handles the negatvie test of this kind of function, but I am
> > not sure whether it needs to be adapted.
> >
> > Best wishes,
> > Cancai Cai
>
>


Some questions about calcite

2024-02-22 Thread Cancai Cai
When I was working CALCITE-6224
, I encountered some
problems and I always had some doubts in my heart.
I thought about it for a long time, maybe I think I already understand the
doubts in my heart.

As @mihaibudiu said, Java grammar has its own type rules, and SQL has its
own type rules. What calcite currently does is to use Java syntax to adapt
to the SQL rules of each database to complete execution optimization. In
some extreme scenarios, the SQL rules of various databases are
inconsistent. Calcite
needs to be sure to adapt to these extreme situations. But, I mean, if one
day, for example, mysql returns the result of log10(0) as an error instead
of null, then does calcite need to adapt to the new version of mysql? If it
adapts to the new version of mysql, does calcite still need to adapt to the
old version of mysql? It seems to me that this may be a paradox. Because in
my opinion, it is very difficult to 100% adapt to the SQL dialect of all
databases, because different dialects of each database need to be
considered, and there may even be differences between versions of different
versions of databases.

Can anyone explain it to me? I would be very grateful.


Re: Some questions about calcite

2024-02-22 Thread Cancai Cai
Hi, everyone
Thank you very much for your answers. In fact, my doubts may not only be about 
adapting log2 (function), as I have discovered a lot when adapting log2 
functions, such as log10(0) about the difference in return values ​​of mysql 
and postgres, about calcite Regarding the handling of infinities, I don't know 
why I haven't discovered them before. As clay said, calcite may have had its 
own set of standards before, but I may not be clear about this. I hope that 
calcite’s return value of the math function in certain calculation situations 
will meet the requirements of mathematics or SQL. rule. If mysql is lazy from 
the beginning, should we still follow it instead of just adapting to an extreme 
situation?

Best wishes.
Cancai Cai

On 2024/02/23 00:45:13 Tanner Clary wrote:
> I feel like I'm missing something about this whole issue. We have
> implemented so many functions that there's probably an existing pattern for
> just about any issue with dialect parity we encounter. What's the core
> problem? What behavior is so difficult to emulate and why? Caican let me
> know if you want to pair I'm happy to manage the edge cases if you want.
> 
> Tanner
> 
> On Thu, Feb 22, 2024 at 4:34 PM Julian Hyde  wrote:
> 
> > I agree. https://issues.apache.org/jira/browse/CALCITE-6224 and its
> > accompanying PR muddies the waters because it also mentions Spark, Postgres
> > and “many databases”. The case should state that the function is consistent
> > with MySQL and returns NULL if the argument is non-positive.
> >
> > > On Feb 22, 2024, at 4:24 PM, Mihai Budiu  wrote:
> > >
> > > In the case of log2 it's simple, because the documentation says that it
> > comes from the MySQL dialect. So there is a spec and a golden
> > implementation to compare against.
> > >
> > > I certainly won't object to implementing a separate log2 function that
> > is undefined for 0 and negative values (i.e., can return any value for such
> > arguments), let's just not pretend it's the MySQL function.
> > >
> > > Mihai
> > > 
> > > From: Julian Hyde 
> > > Sent: Thursday, February 22, 2024 4:05 PM
> > > To: dev@calcite.apache.org 
> > > Subject: Re: Some questions about calcite
> > >
> > > But what is the spec of the LOG2 function? It’s not in the SQL standard.
> > So, we need to write our own spec. We can say that LOG2(0) returns 42, if
> > we wish, and go implement our own spec.
> > >
> > > Yes, Calcite is a compiler, but it is also a standard library, and it is
> > also an extended library. LOG2 is in the latter category. If you, as a
> > vendor, don’t trust the implementation of LOG2 then you can exclude it from
> > your distribution.
> > >
> > > As an open source project we have to BOTH improve the quality of our
> > core and lower the barrier to contributions to the non-core code. We have
> > to recognize that not everything is the same standard. And I think vendors,
> > like your company, who want to deliver a high-quality experience should put
> > barriers around what features are trusted.
> > >
> > > Julian
> > >
> > >> On Feb 22, 2024, at 3:42 PM, Mihai Budiu  wrote:
> > >>
> > >> If we can't even implement correctly the log2 function according to its
> > spec, there is no hope that we will implement anything correctly.
> > >>
> > >> I am not a QA person, but I am spending more than 50% of my time
> > diagnosing and fixing bugs in Calcite. It's not fun. I would rather
> > implement interesting new functionality. But I cannot tell a user of our
> > tools "I have no idea whether the results you get using this tool will be
> > correct. If you are lucky, they will be, don't worry about corner cases."
> > Our goal is to use Calcite in a production environment. If Calcite is
> > designed to be just a research tool, maybe we should make that clear.
> > >>
> > >> There are fundamental bugs in Calcite which have been there for a
> > decade. Even basic things like arithmetic casts are still incorrect. Which
> > is proof that once a bug is in, people are not incentivized to fix them. We
> > should not let bugs in deliberately. They may essentially never get fixed.
> > >>
> > >> I don't think a compiler can cut any corners. The compiler is the
> > foundation of an entire software ecosystem. If the foundation is broken,
> > everything crumbles.
> > >>
> > >> Mihai
> > >>
> > >>

Re: Some questions about calcite

2024-02-22 Thread Cancai Cai
I am still happy to implement them. As far as adapting the log2 function is
concerned, at present I may just not be able to guarantee that log2(0)
returns null, and I have encountered some stuck points here. But you can
see that I've raised two other jira issues [CALCITE-6259]
<https://issues.apache.org/jira/browse/CALCITE-6259>, [CALCITE-6277]
<https://issues.apache.org/jira/browse/CALCITE-6277> about this, and I'm
going to want to fix the jira issue that I raised, because that's what I
found, and I should do it. This is why I keep saying that I can merge this
jira first. Regarding extreme scenarios, I will continue to adapt the type
of discourse in the next jira, because I am not sure whether my current
writing method is appropriate in the next jira.

Cancai Cai  于2024年2月23日周五 09:50写道:

> Hi, everyone
> Thank you very much for your answers. In fact, my doubts may not only be
> about adapting log2 (function), as I have discovered a lot when adapting
> log2 functions, such as log10(0) about the difference in return values ​​of
> mysql and postgres, about calcite Regarding the handling of infinities, I
> don't know why I haven't discovered them before. As clay said, calcite may
> have had its own set of standards before, but I may not be clear about
> this. I hope that calcite’s return value of the math function in certain
> calculation situations will meet the requirements of mathematics or SQL.
> rule. If mysql is lazy from the beginning, should we still follow it
> instead of just adapting to an extreme situation?
>
> Best wishes.
> Cancai Cai
>
> On 2024/02/23 00:45:13 Tanner Clary wrote:
> > I feel like I'm missing something about this whole issue. We have
> > implemented so many functions that there's probably an existing pattern
> for
> > just about any issue with dialect parity we encounter. What's the core
> > problem? What behavior is so difficult to emulate and why? Caican let me
> > know if you want to pair I'm happy to manage the edge cases if you want.
> >
> > Tanner
> >
> > On Thu, Feb 22, 2024 at 4:34 PM Julian Hyde 
> wrote:
> >
> > > I agree. https://issues.apache.org/jira/browse/CALCITE-6224 and its
> > > accompanying PR muddies the waters because it also mentions Spark,
> Postgres
> > > and “many databases”. The case should state that the function is
> consistent
> > > with MySQL and returns NULL if the argument is non-positive.
> > >
> > > > On Feb 22, 2024, at 4:24 PM, Mihai Budiu  wrote:
> > > >
> > > > In the case of log2 it's simple, because the documentation says that
> it
> > > comes from the MySQL dialect. So there is a spec and a golden
> > > implementation to compare against.
> > > >
> > > > I certainly won't object to implementing a separate log2 function
> that
> > > is undefined for 0 and negative values (i.e., can return any value for
> such
> > > arguments), let's just not pretend it's the MySQL function.
> > > >
> > > > Mihai
> > > > 
> > > > From: Julian Hyde 
> > > > Sent: Thursday, February 22, 2024 4:05 PM
> > > > To: dev@calcite.apache.org 
> > > > Subject: Re: Some questions about calcite
> > > >
> > > > But what is the spec of the LOG2 function? It’s not in the SQL
> standard.
> > > So, we need to write our own spec. We can say that LOG2(0) returns 42,
> if
> > > we wish, and go implement our own spec.
> > > >
> > > > Yes, Calcite is a compiler, but it is also a standard library, and
> it is
> > > also an extended library. LOG2 is in the latter category. If you, as a
> > > vendor, don’t trust the implementation of LOG2 then you can exclude it
> from
> > > your distribution.
> > > >
> > > > As an open source project we have to BOTH improve the quality of our
> > > core and lower the barrier to contributions to the non-core code. We
> have
> > > to recognize that not everything is the same standard. And I think
> vendors,
> > > like your company, who want to deliver a high-quality experience
> should put
> > > barriers around what features are trusted.
> > > >
> > > > Julian
> > > >
> > > >> On Feb 22, 2024, at 3:42 PM, Mihai Budiu  wrote:
> > > >>
> > > >> If we can't even implement correctly the log2 function according to
> its
> > > spec, there is no hope that we will implement anything correctly.
> > > >>
> > > >&

Re: Some questions about calcite

2024-02-22 Thread Cancai Cai
Maybe I also overlooked some issues in the whole jira case

Cancai Cai

On 2024/02/23 02:01:53 Cancai Cai wrote:
> I am still happy to implement them. As far as adapting the log2 function is
> concerned, at present I may just not be able to guarantee that log2(0)
> returns null, and I have encountered some stuck points here. But you can
> see that I've raised two other jira issues [CALCITE-6259]
> <https://issues.apache.org/jira/browse/CALCITE-6259>, [CALCITE-6277]
> <https://issues.apache.org/jira/browse/CALCITE-6277> about this, and I'm
> going to want to fix the jira issue that I raised, because that's what I
> found, and I should do it. This is why I keep saying that I can merge this
> jira first. Regarding extreme scenarios, I will continue to adapt the type
> of discourse in the next jira, because I am not sure whether my current
> writing method is appropriate in the next jira.
> 
> Cancai Cai  于2024年2月23日周五 09:50写道:
> 
> > Hi, everyone
> > Thank you very much for your answers. In fact, my doubts may not only be
> > about adapting log2 (function), as I have discovered a lot when adapting
> > log2 functions, such as log10(0) about the difference in return values ​​of
> > mysql and postgres, about calcite Regarding the handling of infinities, I
> > don't know why I haven't discovered them before. As clay said, calcite may
> > have had its own set of standards before, but I may not be clear about
> > this. I hope that calcite’s return value of the math function in certain
> > calculation situations will meet the requirements of mathematics or SQL.
> > rule. If mysql is lazy from the beginning, should we still follow it
> > instead of just adapting to an extreme situation?
> >
> > Best wishes.
> > Cancai Cai
> >
> > On 2024/02/23 00:45:13 Tanner Clary wrote:
> > > I feel like I'm missing something about this whole issue. We have
> > > implemented so many functions that there's probably an existing pattern
> > for
> > > just about any issue with dialect parity we encounter. What's the core
> > > problem? What behavior is so difficult to emulate and why? Caican let me
> > > know if you want to pair I'm happy to manage the edge cases if you want.
> > >
> > > Tanner
> > >
> > > On Thu, Feb 22, 2024 at 4:34 PM Julian Hyde 
> > wrote:
> > >
> > > > I agree. https://issues.apache.org/jira/browse/CALCITE-6224 and its
> > > > accompanying PR muddies the waters because it also mentions Spark,
> > Postgres
> > > > and “many databases”. The case should state that the function is
> > consistent
> > > > with MySQL and returns NULL if the argument is non-positive.
> > > >
> > > > > On Feb 22, 2024, at 4:24 PM, Mihai Budiu  wrote:
> > > > >
> > > > > In the case of log2 it's simple, because the documentation says that
> > it
> > > > comes from the MySQL dialect. So there is a spec and a golden
> > > > implementation to compare against.
> > > > >
> > > > > I certainly won't object to implementing a separate log2 function
> > that
> > > > is undefined for 0 and negative values (i.e., can return any value for
> > such
> > > > arguments), let's just not pretend it's the MySQL function.
> > > > >
> > > > > Mihai
> > > > > 
> > > > > From: Julian Hyde 
> > > > > Sent: Thursday, February 22, 2024 4:05 PM
> > > > > To: dev@calcite.apache.org 
> > > > > Subject: Re: Some questions about calcite
> > > > >
> > > > > But what is the spec of the LOG2 function? It’s not in the SQL
> > standard.
> > > > So, we need to write our own spec. We can say that LOG2(0) returns 42,
> > if
> > > > we wish, and go implement our own spec.
> > > > >
> > > > > Yes, Calcite is a compiler, but it is also a standard library, and
> > it is
> > > > also an extended library. LOG2 is in the latter category. If you, as a
> > > > vendor, don’t trust the implementation of LOG2 then you can exclude it
> > from
> > > > your distribution.
> > > > >
> > > > > As an open source project we have to BOTH improve the quality of our
> > > > core and lower the barrier to contributions to the non-core code. We
> > have
> > > > to recognize that not everything is the same standard. And I think
> > vendors,
&g

Re: Some questions about calcite

2024-02-23 Thread Cancai Cai
Tanner Clary,

Thank you very much for your help. I didn't expect it to work like this. I
learned interesting knowledge.

It seems that the infinity conversion is indeed not converted to double,
but converted to BigDecimal type. I have left a message on jira and I will
try to solve it.

Thank you again for your help. I am very sorry that I have caused a lot of
trouble to you.

Best wishes,
Cancai Cai

Tanner Clary  于2024年2月24日周六 00:10写道:

> Cancai,
>
> I have authored a commit that shows you how to allow functions with
> non-null operands to return null [1]. Feel free to use this in your PR.
>
> As for infinity, Calcite supports infinity, although I believe you need to
> wrap it in a cast to double (at least in my experience). If some dialect
> handles infinity differently maybe we can look at supporting that.
>
> Tanner
>
> [1]
>
> https://github.com/tanclary/calcite/commit/a6ee349f73aa8bf34aea38f33ad39581001c932e
>
> On Thu, Feb 22, 2024 at 6:43 PM Ran Tao  wrote:
>
> > Thanks cancai for opening this discussion.
> >
> > One additional point, regarding your so-called multi-version
> implementation
> > of a certain library, such as Spark.
> > Personally, I think we can implement it according to the latest release
> > version.
> > There will be updates in the future, which can be aligned in calcite.
> > For example, ArrayInsert had a bug in early versions of Spark[1], and
> > calcite's implementation based on it was already wrong.
> >
> > In fact, there are many Library functions in calcite, including
> > mysql/pg/spark, etc., and their implementations are outdated.
> > It is true that we have not discovered or planned to fix them.
> >
> > I bring this up because one of these open source projects is always
> moving
> > forward, calcite also needs to synchronize their modifications.
> >
> > [1] https://issues.apache.org/jira/browse/SPARK-45078
> >
> > Best Regards,
> > Ran Tao
> >
> >
> > Cancai Cai  于2024年2月23日周五 10:10写道:
> >
> > > Maybe I also overlooked some issues in the whole jira case
> > >
> > > Cancai Cai
> > >
> > > On 2024/02/23 02:01:53 Cancai Cai wrote:
> > > > I am still happy to implement them. As far as adapting the log2
> > function
> > > is
> > > > concerned, at present I may just not be able to guarantee that
> log2(0)
> > > > returns null, and I have encountered some stuck points here. But you
> > can
> > > > see that I've raised two other jira issues [CALCITE-6259]
> > > > <https://issues.apache.org/jira/browse/CALCITE-6259>, [CALCITE-6277]
> > > > <https://issues.apache.org/jira/browse/CALCITE-6277> about this, and
> > I'm
> > > > going to want to fix the jira issue that I raised, because that's
> what
> > I
> > > > found, and I should do it. This is why I keep saying that I can merge
> > > this
> > > > jira first. Regarding extreme scenarios, I will continue to adapt the
> > > type
> > > > of discourse in the next jira, because I am not sure whether my
> current
> > > > writing method is appropriate in the next jira.
> > > >
> > > > Cancai Cai  于2024年2月23日周五 09:50写道:
> > > >
> > > > > Hi, everyone
> > > > > Thank you very much for your answers. In fact, my doubts may not
> only
> > > be
> > > > > about adapting log2 (function), as I have discovered a lot when
> > > adapting
> > > > > log2 functions, such as log10(0) about the difference in return
> > values
> > > ​​of
> > > > > mysql and postgres, about calcite Regarding the handling of
> > > infinities, I
> > > > > don't know why I haven't discovered them before. As clay said,
> > calcite
> > > may
> > > > > have had its own set of standards before, but I may not be clear
> > about
> > > > > this. I hope that calcite’s return value of the math function in
> > > certain
> > > > > calculation situations will meet the requirements of mathematics or
> > > SQL.
> > > > > rule. If mysql is lazy from the beginning, should we still follow
> it
> > > > > instead of just adapting to an extreme situation?
> > > > >
> > > > > Best wishes.
> > > > > Cancai Cai
> > > > >
> > > > > On 2024/02/23 00:45:13 Tanner Clary wrote:
> > > > > > I feel like I'm missing something about this who

Re: Some questions about calcite

2024-02-23 Thread Cancai Cai
Hi, Tanner Clary

I also found that some sql functions return java's double.NaN when it is
empty, instead of sql's null. Maybe I will raise a jira to discuss this.

Best wishes,

Cancai Cai

Cancai Cai  于2024年2月24日周六 00:32写道:

> Tanner Clary,
>
> Thank you very much for your help. I didn't expect it to work like this. I
> learned interesting knowledge.
>
> It seems that the infinity conversion is indeed not converted to double,
> but converted to BigDecimal type. I have left a message on jira and I will
> try to solve it.
>
> Thank you again for your help. I am very sorry that I have caused a lot of
> trouble to you.
>
> Best wishes,
> Cancai Cai
>
> Tanner Clary  于2024年2月24日周六 00:10写道:
>
>> Cancai,
>>
>> I have authored a commit that shows you how to allow functions with
>> non-null operands to return null [1]. Feel free to use this in your PR.
>>
>> As for infinity, Calcite supports infinity, although I believe you need to
>> wrap it in a cast to double (at least in my experience). If some dialect
>> handles infinity differently maybe we can look at supporting that.
>>
>> Tanner
>>
>> [1]
>>
>> https://github.com/tanclary/calcite/commit/a6ee349f73aa8bf34aea38f33ad39581001c932e
>>
>> On Thu, Feb 22, 2024 at 6:43 PM Ran Tao  wrote:
>>
>> > Thanks cancai for opening this discussion.
>> >
>> > One additional point, regarding your so-called multi-version
>> implementation
>> > of a certain library, such as Spark.
>> > Personally, I think we can implement it according to the latest release
>> > version.
>> > There will be updates in the future, which can be aligned in calcite.
>> > For example, ArrayInsert had a bug in early versions of Spark[1], and
>> > calcite's implementation based on it was already wrong.
>> >
>> > In fact, there are many Library functions in calcite, including
>> > mysql/pg/spark, etc., and their implementations are outdated.
>> > It is true that we have not discovered or planned to fix them.
>> >
>> > I bring this up because one of these open source projects is always
>> moving
>> > forward, calcite also needs to synchronize their modifications.
>> >
>> > [1] https://issues.apache.org/jira/browse/SPARK-45078
>> >
>> > Best Regards,
>> > Ran Tao
>> >
>> >
>> > Cancai Cai  于2024年2月23日周五 10:10写道:
>> >
>> > > Maybe I also overlooked some issues in the whole jira case
>> > >
>> > > Cancai Cai
>> > >
>> > > On 2024/02/23 02:01:53 Cancai Cai wrote:
>> > > > I am still happy to implement them. As far as adapting the log2
>> > function
>> > > is
>> > > > concerned, at present I may just not be able to guarantee that
>> log2(0)
>> > > > returns null, and I have encountered some stuck points here. But you
>> > can
>> > > > see that I've raised two other jira issues [CALCITE-6259]
>> > > > <https://issues.apache.org/jira/browse/CALCITE-6259>,
>> [CALCITE-6277]
>> > > > <https://issues.apache.org/jira/browse/CALCITE-6277> about this,
>> and
>> > I'm
>> > > > going to want to fix the jira issue that I raised, because that's
>> what
>> > I
>> > > > found, and I should do it. This is why I keep saying that I can
>> merge
>> > > this
>> > > > jira first. Regarding extreme scenarios, I will continue to adapt
>> the
>> > > type
>> > > > of discourse in the next jira, because I am not sure whether my
>> current
>> > > > writing method is appropriate in the next jira.
>> > > >
>> > > > Cancai Cai  于2024年2月23日周五 09:50写道:
>> > > >
>> > > > > Hi, everyone
>> > > > > Thank you very much for your answers. In fact, my doubts may not
>> only
>> > > be
>> > > > > about adapting log2 (function), as I have discovered a lot when
>> > > adapting
>> > > > > log2 functions, such as log10(0) about the difference in return
>> > values
>> > > ​​of
>> > > > > mysql and postgres, about calcite Regarding the handling of
>> > > infinities, I
>> > > > > don't know why I haven't discovered them before. As clay said,
>> > calcite
>> > > may
>> > > > > have had its own set of standards before, but I may not be

Re: Reviewing blog repository

2024-02-25 Thread Cancai Cai
so cool , it will be very useful for beginners and users.
thank you very much

Best regards,
Cancai cai

Zhengqiang Duan  于2024年2月25日周日 19:08写道:

> @Jiajun, thank you very much for putting together such a great study
> material.
>
> Best regards,
> Zhengqiang
>
> Jiajun Xie  于2024年2月25日周日 17:04写道:
> >
> > Hello all,
> >
> > When I read blogs about Calcite,
> > I always want to replicate other people's examples on my own.
> >
> > Constructing examples requires a lot of pre-work,
> > which can be time-consuming for newcomers.
> >
> > So I built a repository that combines code and blogs.
> > Everyone can read blogs and debug code directly without any additional
> > preparation work.
> >
> > Now I have the first chapter:
> > https://github.com/JiajunBernoulli/calcite-notes/tree/cbo
> > Welcome to comment on this PR:
> > https://github.com/JiajunBernoulli/calcite-notes/pull/1
>


Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL systems

2024-02-25 Thread Cancai Cai
Thank you very much to the calcite community for raising these questions.
This is what I have been doubting. I am very sorry that this doubt has been
discussed for so long.

Maybe we also need to consider another issue, that is, the database version
issue. Versions like mysql and postgres are very stable, but components
like spark still seem to have function bugs. exists, then how should we
consider version issues?

I don't know what I can do. Recently I am sorting out some documents about
the use of some functions of mysql and postgres in calcite. I don't know if
this is helpful.

Best wishes,
Cancai Cai


Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL systems

2024-02-27 Thread Cancai Cai
For udf pushdown, I think we can refer to spark sql jdbc[1]. I tried it
with spark sql, which pushed my desired udf down to the corresponding jdbc
data source, which required some minor changes.

Best wishes,
Cancai Cai

[1]
https://github.com/apache/spark/tree/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc

Bertil Chapuis  于2024年2月27日周二 14:28写道:

> Having the ability to write tests with testcontainers would greatly help
> at improving the PostGIS dialect.
>
> Until recently, the official docker image for PostGIS lacked support for
> arm64. However, one can easily change the underlying image used by
> testcontainers [1]. I guess that spawning a Spark SQL container shouldn’t
> be too difficult.
>
> As a side note, I recently opened CALCITE-6281 [2]. From what I
> understand, it is currently not possible to push down UDFs (such as the ST
> functions) to the database. More work in operator tables seems to be
> required and additional tests would probably help at clarifying the
> situation.
>
> [1]
> https://github.com/bchapuis/calcite-postgis-tests/blob/f7d7df1dbff67d5beed2872f44f92b85ed18c18f/src/test/java/com/github/bchapuis/calcite_postgis_tests/AbstractPostgisTest.java#L38
> [2] https://issues.apache.org/jira/browse/CALCITE-6271
>
>
>
> > On 26 Feb 2024, at 23:48, Julian Hyde  wrote:
> >
> > I have logged https://issues.apache.org/jira/browse/CALCITE-6281.
> >
> > On Mon, Feb 26, 2024 at 12:06 PM Hanumath Maduri
> >  wrote:
> >>
> >> It seems to have support for PostGIS, but I didn't test it though.
> >>
> >>
> https://github.com/testcontainers/testcontainers-java/blob/main/modules/postgresql/src/main/java/org/testcontainers/containers/PostgisContainerProvider.java
> >> https://testcontainers.com/modules/postgis/
> >>
> >> Thanks
> >> Hanu
> >>
> >> On Mon, Feb 26, 2024 at 11:20 AM Julian Hyde  wrote:
> >>
> >>> One more thing. The geospatial library, including Bertil's proposed
> >>> PostGIS dialect [5], would also benefit from this effort. We would
> >>> want to ensure that "ST_Contains(ST_Point(0.0, 0.0), ST_Point(0.0,
> >>> 0.0))" returns the same value on PostGIS as Calcite. Does
> >>> testContainers support PostGIS?
> >>>
> >>> Julian
> >>>
> >>> [5] https://issues.apache.org/jira/browse/CALCITE-6239
> >>>
> >>> On Mon, Feb 26, 2024 at 11:07 AM Julian Hyde  wrote:
> >>>>
> >>>> Thank you, everyone for your thoughts so far. (And let's keep
> talking!)
> >>>>
> >>>> I don't yet see a full, perfect solution. But I see a number of
> >>>> techniques/tools we can use. For example, I was not aware of
> >>>> testContainers but it seems to very quickly address our most common
> >>>> use cases (MySQL + Postgres). Also, while I like Quidem it probably
> >>>> should not be our first solution to this problem.
> >>>>
> >>>> I have two concerns about testContainers:
> >>>> 1. it makes our test suite heavier: it adds latency, memory
> >>>> requirements, flakiness, and the requirement for Docker;
> >>>> 2. it doesn't solve other important dialects, such as BigQuery.
> >>>>
> >>>> To solve (1), we could add the tests to an optional module, such as
> >>>> 'plus'. To solve (2) we need to provide another way to get a
> >>>> connection to a reference database. (We need that for the other huge
> >>>> project, the converse of this project, to test whether Calcite's
> >>>> generated SQL is compliant [1].)
> >>>>
> >>>> I think we should start by merging Stamatis' PR into the 'plus'
> >>>> module, and add some tests for CONCAT, which we have already
> >>>> implemented, and has different behaviors in MySQL, Postgres, Oracle
> >>>> [2][3]. Let's enable that test in CI and make sure that it doesn't
> >>>> introduce flakiness.
> >>>>
> >>>> Next, we could make that test invoke tests that are already defined in
> >>>> SqlOperatorTest, in addition to/instead of, the CSV data in Stamatis'
> >>>> PR.
> >>>>
> >>>> Next, devise a way to test against Spark SQL and BigQuery, and check
> >>>> our implementation of SOUNDEX (different in Spark than
> >>>> Postgres/Oracle) and SUBSTR/SUBSTRING (different on Postgres/BigQuery)
> >&g

[Question] Regarding the loading conflict between SqlStdOperatorTable and SqlLibraryOperators

2024-03-05 Thread Cancai Cai
Hello, everyone in the calcite community.

I'm not sure if I should ask this question.

In the process of fixing CALCITE-6259
<https://issues.apache.org/jira/browse/CALCITE-6259>, I found that I
defined a log10_MS function in SqlLibraryOperators to simulate the behavior
of the log function in mysql, but the ReturnTypes.DOUBLE_FORCE_NULLABLE I
set was not Effective, its return value still complies with the ReturnTypes
of the LOG10 function in SqlStdOperatorTable. It seems to be a class
loading conflict. I tried to modify sqlkind, but it didn’t take effect. Is
there any good way?

Thank you all.

Best wishes.
Cancai Cai


Re: [ANNOUNCE] Sergey Nuyanzin joins Calcite PMC

2024-03-05 Thread Cancai Cai
Congratulations, Sergey!


Re: [Question] Regarding the loading conflict between SqlStdOperatorTable and SqlLibraryOperators

2024-03-10 Thread Cancai Cai
It seems like I can't find a suitable way to solve this problem, can you
give me some suggestions?

Best regards,
Cancai Cai

Benchao Li  于2024年3月5日周二 19:26写道:

> It's not a class loading problem, it's an operator lookup mechanism,
> which is a by design behavior, you can see details here[1]. If there
> are multiple candidates in the result of operator lookup, the first
> one will be returned.
>
> [1]
> https://github.com/apache/calcite/blob/d64ee0037db7abacdfcd4e774072f5e3fb8f2998/core/src/main/java/org/apache/calcite/sql/SqlUtil.java#L494-L517
>
> Cancai Cai  于2024年3月5日周二 16:53写道:
> >
> > Hello, everyone in the calcite community.
> >
> > I'm not sure if I should ask this question.
> >
> > In the process of fixing CALCITE-6259
> > <https://issues.apache.org/jira/browse/CALCITE-6259>, I found that I
> > defined a log10_MS function in SqlLibraryOperators to simulate the
> behavior
> > of the log function in mysql, but the ReturnTypes.DOUBLE_FORCE_NULLABLE I
> > set was not Effective, its return value still complies with the
> ReturnTypes
> > of the LOG10 function in SqlStdOperatorTable. It seems to be a class
> > loading conflict. I tried to modify sqlkind, but it didn’t take effect.
> Is
> > there any good way?
> >
> > Thank you all.
> >
> > Best wishes.
> > Cancai Cai
>
>
>
> --
>
> Best,
> Benchao Li
>


Re: [Question] Regarding the loading conflict between SqlStdOperatorTable and SqlLibraryOperators

2024-03-12 Thread Cancai Cai
Hello, I conducted debugging, and the result is as you said. I may need
some time to think about how to modify this part of the code. Thank you
very much for your suggestion.

Best regards,
Cancai Cai

Benchao Li  于2024年3月11日周一 21:09写道:

> I didn't go deep into CALCITE-6259, so I might not able to give you
> the precise solution / code. Generally, you can add your operator
> table in front of the standard operator table, or just leave the
> standard operator table out, that would meet the requirement that you
> want to "override" a standard operator.
>
> Cancai Cai  于2024年3月10日周日 17:43写道:
> >
> > It seems like I can't find a suitable way to solve this problem, can you
> > give me some suggestions?
> >
> > Best regards,
> > Cancai Cai
> >
> > Benchao Li  于2024年3月5日周二 19:26写道:
> >
> > > It's not a class loading problem, it's an operator lookup mechanism,
> > > which is a by design behavior, you can see details here[1]. If there
> > > are multiple candidates in the result of operator lookup, the first
> > > one will be returned.
> > >
> > > [1]
> > >
> https://github.com/apache/calcite/blob/d64ee0037db7abacdfcd4e774072f5e3fb8f2998/core/src/main/java/org/apache/calcite/sql/SqlUtil.java#L494-L517
> > >
> > > Cancai Cai  于2024年3月5日周二 16:53写道:
> > > >
> > > > Hello, everyone in the calcite community.
> > > >
> > > > I'm not sure if I should ask this question.
> > > >
> > > > In the process of fixing CALCITE-6259
> > > > <https://issues.apache.org/jira/browse/CALCITE-6259>, I found that I
> > > > defined a log10_MS function in SqlLibraryOperators to simulate the
> > > behavior
> > > > of the log function in mysql, but the
> ReturnTypes.DOUBLE_FORCE_NULLABLE I
> > > > set was not Effective, its return value still complies with the
> > > ReturnTypes
> > > > of the LOG10 function in SqlStdOperatorTable. It seems to be a class
> > > > loading conflict. I tried to modify sqlkind, but it didn’t take
> effect.
> > > Is
> > > > there any good way?
> > > >
> > > > Thank you all.
> > > >
> > > > Best wishes.
> > > > Cancai Cai
> > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
>
>
>
> --
>
> Best,
> Benchao Li
>


How to better support the adaptation of the same functions in different dialects

2024-03-14 Thread Cancai Cai
Hello, everyone in the calcite community.

Recently, I have been adapting the behavior of functions in different
dialects to the calcite community. At first, I used the method in
CALCITE-5831 <https://issues.apache.org/jira/browse/CALCITE-5831>[1], but
later I discovered that this The method is not very suitable. A function
may have multiple behaviors in different dialects (for example, the
behaviors of the log function in mysql, pg, and bigquery are somewhat
different). We may need to add some specific methods and
SqlLibraryOperators to these behaviors respectively. name.

In CALCITE-6325 <https://issues.apache.org/jira/browse/CALCITE-6325>[2],
tanclary proposed that @LibraryAnnotations can be used to optimize (I hope
I didn’t understand).  Julian also said before that it can be implemented
in the form of flag bits (but I don’t know how to use the database The type
flag is passed into Sqlfunctions)

I would like to ask everyone what you think?

Seriously, adaptation functions are not fun, but I want to help the
community improve this problem.

Best wishes,
Cancai Cai

[1] :https://issues.apache.org/jira/browse/CALCITE-5831
[2]: https://issues.apache.org/jira/browse/CALCITE-6325


Re: How to better support the adaptation of the same functions in different dialects

2024-03-14 Thread Cancai Cai
I've asked a lot of questions recently, and I hope I won't cause too much 
trouble to the community.

On 2024/03/14 10:32:09 Cancai Cai wrote:
> Hello, everyone in the calcite community.
> 
> Recently, I have been adapting the behavior of functions in different
> dialects to the calcite community. At first, I used the method in
> CALCITE-5831 <https://issues.apache.org/jira/browse/CALCITE-5831>[1], but
> later I discovered that this The method is not very suitable. A function
> may have multiple behaviors in different dialects (for example, the
> behaviors of the log function in mysql, pg, and bigquery are somewhat
> different). We may need to add some specific methods and
> SqlLibraryOperators to these behaviors respectively. name.
> 
> In CALCITE-6325 <https://issues.apache.org/jira/browse/CALCITE-6325>[2],
> tanclary proposed that @LibraryAnnotations can be used to optimize (I hope
> I didn’t understand).  Julian also said before that it can be implemented
> in the form of flag bits (but I don’t know how to use the database The type
> flag is passed into Sqlfunctions)
> 
> I would like to ask everyone what you think?
> 
> Seriously, adaptation functions are not fun, but I want to help the
> community improve this problem.
> 
> Best wishes,
> Cancai Cai
> 
> [1] :https://issues.apache.org/jira/browse/CALCITE-5831
> [2]: https://issues.apache.org/jira/browse/CALCITE-6325
> 


Re: PRs to merge in 1.27

2024-04-14 Thread Cancai Cai
Hi, Mihai,

Hello, thank you very much. Since I was quite busy some time ago, I didn't
have time to follow up on my PR. I would be very grateful if you would
review my PR and make suggestions for improvement. I will be very happy to
follow up and merge it into 1.37.

[1] [CALCITE-6325] 
[2 ][CALCITE-6300] 

Mihai Budiu  于2024年4月12日周五 01:03写道:

> Hello all contributors,
>
> The number of open PRs in the Calcite queue has grown a lot, and it is
> very easy to overlook a PR which hasn't been merged quickly. If you have a
> PR which you think is ready for 1.37, please let us know, even if it hasn't
> received any reviews.
>
> I know this from my personal experience, I have a handful of bug fixes
> which I was hoping to get into 1.37 which haven't been reviewed. I
> understand that too, I don't feel comfortable reviewing PRs for parts of
> the code which I don't know well. Perhaps the right way to do it is to
> explain in more details in the JIRA discussions what the PRs do, why, and
> how, to help guide the reviewers?
>
> Mihai
>


Optimize the type conversion of spark array function and map function in calcite

2024-04-15 Thread Cancai Cai
Hi, calcite community,

Recently, I am testing the map and array related functions of spark in
calcite. I found that in some cases, spark is a little different from our
understanding of type conversion.

For example

scala>  val df = spark.sql("select map_contains_key(map(1, 'a', 2, 'b'), 2.0)")
val df: org.apache.spark.sql.DataFrame = [map_contains_key(map(1, a,
2, b), 2.0): boolean]

scala> df.show()
+--+
|map_contains_key(map(1, a, 2, b), 2.0)|
+--+
|  true|
+--+

Mihai Budiu pointed out that similar processing may be done in Spark,

map_contains_key(map((Double)1, 'a', (Double)2, 'b'), 2.0)

We can't say that Spark is wrong, we should adapt to this situation, so I
think I might add an adjustTypeForMapContainsKey method to perform display
conversion on it, but this situation should not only exist in the
map_contain_keys method, we cannot guarantee map_concat that they are no
similar problems with other related functions. Therefore, we should
discover what common characteristics these functions have in type
conversion, and we should encapsulate them in a unified method instead of
adding a similar adjust method to each function.

I thought I should do this in three steps.

①Test various situations related to the map function and array function in
Spark, and raise jira if it is inconsistent with the spark behavior in
calcite

② Summarize the same characteristics of some functions and find out whether
there is any relationship

③For the same characteristics, use a method to encapsulate the type
conversion。

The above are my personal thoughts. I feel that this may be more conducive
to the maintenance of calcite code.

Finally, thank you for reading

Best wishes,

Cancai Cai


Re: Optimize the type conversion of spark array function and map function in calcite

2024-04-15 Thread Cancai Cai
  Thank you for your suggestion, spark sql is open source

Best wishes,
Cancai Cai

Mihai Budiu  于2024年4月16日周二 00:36写道:

> Is the spark SQL implementation open-source?
> If it is, the algorithm they use may be inferred from the code.
>
> Mihai
> ____
> From: Cancai Cai 
> Sent: Monday, April 15, 2024 8:10 AM
> To: dev@calcite.apache.org 
> Subject: Optimize the type conversion of spark array function and map
> function in calcite
>
> Hi, calcite community,
>
> Recently, I am testing the map and array related functions of spark in
> calcite. I found that in some cases, spark is a little different from our
> understanding of type conversion.
>
> For example
>
> scala>  val df = spark.sql("select map_contains_key(map(1, 'a', 2, 'b'),
> 2.0)")
> val df: org.apache.spark.sql.DataFrame = [map_contains_key(map(1, a,
> 2, b), 2.0): boolean]
>
> scala> df.show()
> +--+
> |map_contains_key(map(1, a, 2, b), 2.0)|
> +--+
> |  true|
> +--+
>
> Mihai Budiu pointed out that similar processing may be done in Spark,
>
> map_contains_key(map((Double)1, 'a', (Double)2, 'b'), 2.0)
>
> We can't say that Spark is wrong, we should adapt to this situation, so I
> think I might add an adjustTypeForMapContainsKey method to perform display
> conversion on it, but this situation should not only exist in the
> map_contain_keys method, we cannot guarantee map_concat that they are no
> similar problems with other related functions. Therefore, we should
> discover what common characteristics these functions have in type
> conversion, and we should encapsulate them in a unified method instead of
> adding a similar adjust method to each function.
>
> I thought I should do this in three steps.
>
> ①Test various situations related to the map function and array function in
> Spark, and raise jira if it is inconsistent with the spark behavior in
> calcite
>
> ② Summarize the same characteristics of some functions and find out whether
> there is any relationship
>
> ③For the same characteristics, use a method to encapsulate the type
> conversion。
>
> The above are my personal thoughts. I feel that this may be more conducive
> to the maintenance of calcite code.
>
> Finally, thank you for reading
>
> Best wishes,
>
> Cancai Cai
>


About calcite’s processing of metadata information

2024-05-08 Thread Cancai Cai
Hello everyone in the calcite community, I am doing some interesting things
with calcite. There is a question I would like to ask, but I have not seen
this part of the calcite code, so I don't know if it is appropriate to ask
it here.

When I use calcite's sqlValidator.validate interface to verify the metadata
of the hive table, it takes a long time. I think this is because I have a
large number of hive tables, 300,000 tables. I would like to ask if calcite
has the function of caching metadata information, or if I can do this:
cache the metadata information and then verify it.(I have done similar
things in spark sql, I don’t know if calcite can ).

Or do you have any good plans?

Thank you.

Best regards,
Cancai cai


Questions about numeric type conversion

2024-06-09 Thread Cancai Cai
Hello calcite community,
I would like to ask a question. Actually, I am not sure whether it is
appropriate to ask this question in the calcite community. It is mainly
about the database's handling of numeric types.

[10:53:45]TiDB root:test> explain select id = 8145308033243873280 from test;
+-+--+---+--++
| id  | estRows  | task  | access object|
operator info  |
+-+--+---+--++
| Projection_3| 1.00 | root  |  |
eq(cast(test.test.id, double BINARY), 8.145308033243873e+18)->Column#3 |
| └─IndexReader_7 | 1.00 | root  |  |
index:IndexFullScan_6  |
|   └─IndexFullScan_6 | 1.00 | cop[tikv] | table:test, index:id(id) |
keep order:false, stats:pseudo |
+-+--+---+--++
3 rows in set
Time: 0.004s
[10:54:08]TiDB root:test> select cast('8145308033243873281' as double) ;
+---+
| cast('8145308033243873281' as double) |
+---+
| 8.145308033243873e+18 |
+---+
1 row in set
Time: 0.008s
[10:54:15]TiDB root:test> select cast('8145308033243873280' as double) ;
+---+
| cast('8145308033243873280' as double) |
+---+
| 8.145308033243873e+18 |
+---+
1 row in set
Time: 0.003s

As shown above, some numeric types are converted to double type by default
in the underlying database, but this is not compatible with bigdecimal and
decimal types. Why is this?

These behaviors exist in both MySQL
<https://dev.mysql.com/doc/refman/8.0/en/type-conversion.html>(I put the
above example into mysql and it is also reproduced) and Spark. For example,
Spark's map_contains_key function also converts numeric types to double for
comparison.

I am not a database developer, I am curious about where the difficulty is
here, if someone can tell me, I will be very grateful and happy that I will
learn something new.

Best wishes,
Cancai Cai


Re: Questions about numeric type conversion

2024-06-09 Thread Cancai Cai
Hi Julian,

Thank you for your answer

Best regards,
Cancai Cai


Julian Hyde  于2024年6月10日周一 05:38写道:

> I believe that there are many overlapping problems, many of which are that
> each DBMS has its own semantics.
>
> In order to emulate a particular DBMS, we need to understand its
> semantics, document that semantics by writing tests, and then follow that
> semantics (if Calcite is in a mode that should follow those semantics).
>
> In my opinion, some DBMS overuse binary floating point types (which are
> often lossy when converting integer or decimal literals). Ideally Calcite
> would never convert to a binary floating point unless specifically asked to
> do so. But if we are emulating another DBMS, maybe we would need to do that.
>
> By default, Calcite’s numeric literals are arbitrary-precision decimal
> values. But some recent threads/issues have convinced me that we might need
> arbitrary precision binary numeric literals in some cases (e.g. after
> applying constant reduction).
>
>
>
> > On Jun 9, 2024, at 12:21 PM, Cancai Cai  wrote:
> >
> > Hello calcite community,
> > I would like to ask a question. Actually, I am not sure whether it is
> > appropriate to ask this question in the calcite community. It is mainly
> > about the database's handling of numeric types.
> >
> > [10:53:45]TiDB root:test> explain select id = 8145308033243873280 from
> test;
> >
> +-+--+---+--++
> > | id  | estRows  | task  | access object|
> > operator info  |
> >
> +-+--+---+--++
> > | Projection_3| 1.00 | root  |  |
> > eq(cast(test.test.id, double BINARY), 8.145308033243873e+18)->Column#3 |
> > | └─IndexReader_7 | 1.00 | root  |  |
> > index:IndexFullScan_6  |
> > |   └─IndexFullScan_6 | 1.00 | cop[tikv] | table:test, index:id(id) |
> > keep order:false, stats:pseudo |
> >
> +-+--+---+--++
> > 3 rows in set
> > Time: 0.004s
> > [10:54:08]TiDB root:test> select cast('8145308033243873281' as double) ;
> > +---+
> > | cast('8145308033243873281' as double) |
> > +---+
> > | 8.145308033243873e+18 |
> > +---+
> > 1 row in set
> > Time: 0.008s
> > [10:54:15]TiDB root:test> select cast('8145308033243873280' as double) ;
> > +---+
> > | cast('8145308033243873280' as double) |
> > +---+
> > | 8.145308033243873e+18 |
> > +---+
> > 1 row in set
> > Time: 0.003s
> >
> > As shown above, some numeric types are converted to double type by
> default
> > in the underlying database, but this is not compatible with bigdecimal
> and
> > decimal types. Why is this?
> >
> > These behaviors exist in both MySQL
> > <https://dev.mysql.com/doc/refman/8.0/en/type-conversion.html>(I put the
> > above example into mysql and it is also reproduced) and Spark. For
> example,
> > Spark's map_contains_key function also converts numeric types to double
> for
> > comparison.
> >
> > I am not a database developer, I am curious about where the difficulty is
> > here, if someone can tell me, I will be very grateful and happy that I
> will
> > learn something new.
> >
> > Best wishes,
> > Cancai Cai
>
>


About CommunityOverCode Asia 2024 Share on calcite

2024-06-18 Thread Cancai Cai
Hello, everyone in the calcite community.

CommunityOverCode Asia 2024 will be held in Hangzhou, China at the end of
July, and I am glad that I can go there to share some of my open source
experience.

I plan to share what I have learned from streampark and calcite.

Unfortunately, I rarely use calcite at work, so I can't share calcite
production practices, and I may be a novice in terms of calcite code
familiarity, so I am struggling to share which interesting features of
calcite I should share, such as the adaptation of arrow.

If there are any interesting new features that calcite is currently working
on, please tell me, I am very willing to learn, and then share and discuss
at CommunityOverCode Asia 2024

Best wishes,
Cancai Cai


Re: Function library for Amazon Redshift

2024-06-24 Thread Cancai Cai
Does the redshift official website fully explain compatibility with the 
postgres dialect?

If so, I think this is acceptable. Because we must consider that some negative 
tests of redshift functions are consistent with postgres, otherwise it may 
cause some trouble to users.

> 2024年6月22日 07:13,Julian Hyde  写道:
> 
> Redshift



Re: AS MEASURE

2024-07-02 Thread Cancai Cai
It looks cool. I am reading this paper, but I am not sure if I have the
ability to review this PR.

Maybe I can understand this paper in combination with PR.

Best wishes,
Cancai Cai

Julian Hyde  于2024年7月1日周一 04:30写道:

> I've added a pull request that supports the AS MEASURE construct so
> that you can define measures in a subquery or view (and then use them
> in an enclosing query).
>
> Can people please review https://github.com/apache/calcite/pull/3837.
>
> Not everything works yet (I describe future changes in
> https://issues.apache.org/jira/browse/CALCITE-4496) but a lot of
> things do. I would be especially interested in a discussion of how to
> implement the rewrites for measures - by means of planner rules,
> metadata (RelMdMeasure), or some combination.
>
> You can find the theory in our paper,
> https://dl.acm.org/doi/pdf/10.1145/3626246.3653374.
>
> Julian
>


Re: Draft: board report for 2024 Q2

2024-07-03 Thread Cancai Cai
+1, Thank you very much mihai. I learn a lot from every review of my PR.
Thank you Julian for answering jira questions quickly every time.

Julian Hyde  于2024年7月3日周三 21:36写道:

> PS Thank you to Mihai for taking the lion’s share of reviewing activity.
> The rest of us (myself included) need to step up and help him out.
>
> > On Jul 3, 2024, at 6:30 AM, Julian Hyde  wrote:
> >
> > +1.
> >
> > Thank you for mentioning my paper.
> >
> > Julian
> >
> >> On Jul 3, 2024, at 5:19 AM, Francis Chuang 
> wrote:
> >>
> >> +1, great work, Benchao!
> >>
>  On 3/07/2024 10:09 pm, Benchao Li wrote:
> >>> Hello,
> >>> Below you can find a draft of this quarter's board report. I plan to
> >>> submit the final version next Tuesday (Jul 9, 2024).
> >>> Please let me know if you have any additions or corrections.
> >>> ## Description:
> >>> Apache Calcite is a highly customizable framework for parsing and
> planning
> >>> queries on data in a wide variety of formats. It allows database-like
> >>> access,
> >>> and in particular a SQL interface and advanced query optimization, for
> data
> >>> not residing in a traditional database.
> >>> Avatica is a sub-project within Calcite and provides a framework for
> >>> building
> >>> local and remote JDBC and ODBC database drivers. Avatica has an
> independent
> >>> release schedule and its own repository.
> >>> ## Project Status:
> >>> There are no issues requiring board attention.
> >>> ## Membership Data:
> >>> Apache Calcite was founded 2015-10-22 (9 years ago)
> >>> There are currently 74 committers and 28 PMC members in this project.
> >>> The Committer-to-PMC ratio is roughly 5:2.
> >>> Community changes, past quarter:
> >>> - No new PMC members. Last addition was Sergey Nuyanzin on 2024-03-05.
> >>> - No new committers. Last addition was Hongyu Guo on 2023-11-03.
> >>> ## Project Activity:
> >>> Apache Calcite Avatica 1.25.0 was released on 2024-04-05. It is a
> routine
> >>> release featuring support for JDK 21, Gradle 8.5 and several bug fixes.
> >>> Apache Calcite 1.37.0 was released on 2024-05-04. It contains
> contributions
> >>> from 46 contributors, and resolves 138 issues. It introduces Apache
> Arrow
> >>> adapter and StarRocks dialect, adds support for lambda expressions in
> SQL and
> >>> ‘Must-filter’ columns. For table function calls it is now possible to
> use them
> >>> without TABLE() wrapper in FROM. Furthermore, there is support for
> optional
> >>> FORMAT of CAST operator from SQL:2016 and more than 15 new SQL
> functions in
> >>> various libraries such as BigQuery, PostgreSQL and Spark.
> >>> Besides releases, it's worth to mention that Julian Hyde presented
> "Measures
> >>> in SQL" both in SF Distributed Systems Meetup in downtown SF on
> Wednesday May
> >>> 22nd, and SIGMOD 2024 conference in Santiago, Chile, which has been
> >>> implemented in Calcite.
> >>> ## Community Health:
> >>> The community maintains a healthy status, previously it's super
> healthy. The
> >>> reason is that we did not invite new committers for more than 6
> months. There
> >>> are a few new faces in the community, hopefully we'll invite new
> committers in
> >>> the near future.
> >>> Most of the statistics slightly decreased compared to last quarter
> (dev@
> >>> decreased by 39%, issues@ decreased by 22%, commits decreased by 31%,
> code
> >>> contributors decreased by 32%, active reviewers decreased by 36.8%).
> The
> >>> reason I can see is there is less activity in supporting new SQL
> functions in
> >>> various libraries recently, which is much more active in previous
> quarters.
> >>> The number of non-committer (contributor) commits per month:
> >>> +--+---+-+
> >>> | year | month | contributor_commits |
> >>> +--+---+-+
> >>> | 2024 | 4 |  13 |
> >>> | 2024 | 5 |  11 |
> >>> | 2024 | 6 |  15 |
> >>> +--+---+-+
> >>> The number of active reviewers per month:
> >>> +--+---+--+
> >>> | year | month | active_reviewers |
> >>> +--+---+--+
> >>> | 2024 | 4 |5 |
> >>> | 2024 | 5 |4 |
> >>> | 2024 | 6 |3 |
> >>> +--+---+--+
> >>> Top reviewers in the last 3 months:
> >>> +--+-+
> >>> | committer| reviews |
> >>> +--+-+
> >>> | Mihai Budiu  |  19 |
> >>> | NobiGo  |   9 |
> >>> | Julian Hyde|   3 |
> >>> +--+-+
> >>> Best,
> >>> Benchao Li
>


Re: [ANNOUNCE] Mihai Budiu joins Calcite PMC

2024-07-25 Thread Cancai Cai
Congratulations, Mihai!
Thank you Mihai for your help all the time

> 2024年7月26日 10:43,Francis Chuang  写道:
> 
> Congratulations, Mihai!
> 
> On 26/07/2024 12:37 pm, Benchao Li wrote:
>> I am pleased to announce that Mihai has accepted an invitation to
>> join the Calcite PMC. Mihai has been a consistent and helpful figure
>> in the Calcite community for which we are very grateful. We look
>> forward to the continued contributions and support.
>> Please join me in congratulating Mihai!
>> Benchao (on behalf of the Calcite PMC)



Re: [ANNOUNCE] New committer: Norman Jordan

2024-08-14 Thread Cancai Cai
Congratulations Norman!

> 2024年8月14日 20:23,Alessandro Solimando  写道:
> 
> Congratulations, Norman, very well deserved!
> 
> On Wed, Aug 14, 2024, 13:43 Benchao Li  wrote:
> 
>> Congratulations Norman!
>> 
>> Ruben Q L  于2024年8月14日周三 19:15写道:
>>> 
>>> Congratulations Norman!!!
>>> 
>>> 
>>> On Wed, Aug 14, 2024 at 12:05 PM Michael Mior  wrote:
>>> 
 Congratulations Norman!
 --
 Michael Mior
 mm...@apache.org
 
 
 On Wed, Aug 14, 2024 at 6:09 AM Stamatis Zampetakis >> 
 wrote:
 
> Apache Calcite's Project Management Committee (PMC) has invited
>> Norman
> Jordan to
> become a committer, and we are pleased to announce that they have
 accepted.
> 
> Norman has contributed some high quality patches to Calcite over the
> past few months bringing many improvements and fixes around Postgres
> and Redshift functions.
> 
> Norman, welcome, thank you for your contributions, and we look
>> forward to
> your
> further interactions with the community! If you wish, please feel
>> free to
> tell
> us more about yourself and what you are working on.
> 
> As your first commit, please add yourself to the contributors list
>> [1]
> and the community page will re-generate [2].
> 
> Stamatis (on behalf of the Apache Calcite PMC)
> 
> [1]
> 
>> https://github.com/apache/calcite/blob/main/site/_data/contributors.yml
> 
> [2] https://calcite.apache.org/community/#project-members
> 
 
>> 
>> 
>> 
>> --
>> 
>> Best,
>> Benchao Li
>> 



Re: Query planner structure

2024-08-15 Thread Cancai Cai
This is what I have always wanted to do. It is very challenging, but also very 
meaningful. 

In fact, other projects have already taken this step, such as datafusion. I 
have an idea whether calcite can be combined with datafusion. 

Calcite focuses on query optimization, while datafusion focuses on execution. 
Perhaps the two can complement each other. I know that someone in cmu is making 
similar attempts, such as optd.

I am also making similar attempts.

[1]  https://github.com/apache/datafusion
[2]  https://github.com/cmu-db/optd

> 2024年8月15日 14:49,Stamatis Zampetakis  写道:
> 
> Hey Mihai,
> 
> Putting the rules together and building a good query planner is a
> pretty challenging task. I don't know if there is a perfect solution
> out there but there are certainly some available planners that you can
> have a look at.
> 
> Apache Hive's logical optimizer [1] is built using Calcite and has
> been running real production workloads for many years now. In terms of
> results you can get an idea of the produced plans for TPC-DS queries
> by looking at the plans in [2].
> 
> Best,
> Stamatis
> 
> [1] 
> https://github.com/apache/hive/blob/b09d76e68bfba6be19733d864b3207f95265d11f/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L1609
> [2] 
> https://github.com/apache/hive/tree/b09d76e68bfba6be19733d864b3207f95265d11f/ql/src/test/results/clientpositive/perf/tpcds30tb/tez
> 
> On Thu, Aug 15, 2024 at 6:02 AM suibianwanwan <1597226...@qq.com.invalid> 
> wrote:
>> 
>> I'm trying to reproduce the problem, but the result of my convert Query 
>> doesn't have the LogicalCorrelate.
>> 
>> 
>> So I tried to create a test in calcite org.apache.calcite.tools.PlannerTest 
>> as follows:
>> 
>> 
>>     Planner planner = getPlanner(null);
>>     final String sql = "SELECT 3 in (SELECT \"deptno\" FROM 
>> \"emps\")";
>>     SqlNode parse = planner.parse(sql);
>>     SqlNode validate = planner.validate(parse);
>>     RelNode convert = planner.rel(validate).rel;
>> 
>> relnode as follows:
>> LogicalProject.NONE.[](input=LogicalValues#6,exprs=[IN(3, {
>> LogicalProject(deptno=[$1])
>>   LogicalTableScan(table=[[hr, emps]])
>> })])
>> 
>> 
>> 
>> Do you have a way to reproduce this in calcite?
>> 
>> 
>> Regards,
>> suibianwanwan
>> -- Original --
>> From:
>> "dev"
>> 
>> > Date: Thu, Aug 15, 2024 06:23 AM
>> To: "dev"> 
>> Subject: Re: Re:Query planner structure
>> 
>> 
>> 
>> I have tried this approach, but the decorrelator fails to decorrelate many 
>> queries. Here is a simple query where after decorrelation there are still 
>> LogicalCorrelate nodes left:
>> 
>> SELECT 3 in (SELECT empno FROM emp)
>> 
>> Maybe I am missing some optimization steps?
>> 
>> From my reading of the code, the Calcite decorrelator seems to be a 
>> heuristic decorrelator, which only handles a few types of patterns of 
>> correlated subqueries. If your query doesn't fall into one of these 
>> patterns, the decorrelator leaves it unchanged.
>> 
>> For example, this query, after converting it to use LogicalCorrelate as you 
>> have described, becomes:
>> 
>>     LogicalProject(EXPR$0=[IS NOT NULL($1)]), id = 211
>>   LogicalCorrelate(correlation=[$cor0], 
>> joinType=[left], requiredColumns=[{}]), id = 213
>>     LogicalValues(tuples=[[{ 0 }]]), 
>> id = 180
>>     LogicalAggregate(group=[{0}]), id 
>> = 207
>>   
>> LogicalProject(cs=[true]), id = 205
>>     
>> LogicalFilter(condition=[=(3, $0)]), id = 203
>>  
>>  LogicalProject(EMPNO=[$0]), id = 201
>>    
>>  LogicalTableScan(table=[[schema, EMP]]), id = 182
>> 
>> The call to RelDecorrelator.decorrelateQuery(rel, builder) cannot 
>> decorrelate this query.
>> 
>> Do you have a solution that handles this case?
>> 
>> BTW: I also think that the FILTER_INTO_JOIN rule is unsound , e.g.: 
>> https://issues.apache.org/jira/browse/CALCITE-5627
>> 
>> Thank you,
>> Mihai
>> 
>> From: suibianwanwan33 > Sent: Tuesday, August 13, 2024 10:31 PM
>> To: dev > Subject: Re:Query planner structure
>> 
>> Hi, this is one of the classic optimization cases in my work.
>> First you need to add all CORRELATE optimizations such as 
>> FILTER_SUB_QUERY_TO_CORRELATE in a HepPlanner, after that you need to do a 
>> decorrelateQuery, and finally use a HepPlanner to add the 
>> JOIN_CONDITION_PUSH and the FILTER_INTO_JOIN. my guess is that the Cartesian 
>> product of this query will be optimized
>> 
>> 
>> 
>> 
>> -- Original --
>> From:   
>>  
>> "dev"   

Some doubts about date functions

2024-08-26 Thread Cancai Cai
Hello, everyone in the Calcite community.

Recently, I am working on https://issues.apache.org/jira/browse/CALCITE-6544 of 
Calcite. The date function seems to be a little different from other functions. 
I can't find where the final results of date_add and date_sub are calculated. I 
only see various cast conversions, which makes me a little confused.

I don't know where I should limit the final result set. Can you give me some 
advice or some jira cases to learn from?

Best wishes,
Cancai Cai

Re: Some doubts about date functions

2024-08-26 Thread Cancai Cai
Thanks for your reply.

Yes, I also located here, and I finally debugged to 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L3802
 but I don’t know how to continue.

I find it hard to believe that 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/util/BuiltInMethod.java#L517
 is the processing of the data_add function, because its name is add_months not 
date_add

> 2024年8月26日 23:19,Norman Jordan  写道:
> 
> this leads to here:
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/util/BuiltInMethod.java#L517
> [https://opengraph.githubassets.com/6af4884a9b80dd1ac5b81e610caea8c9e34bcc847d0e172706a503b58f4f1557/apache/calcite]
> calcite/core/src/main/java/org/apache/calcite/util/BuiltInMethod.java at main 
> · 



Re: Some doubts about date functions

2024-08-28 Thread Cancai Cai
I tried many ways to write it. As Julian said, Maybe BigQuery's DATE data type 
has a different range than Calcite's. I am not sure if I can fix this problem. 

If I adapt some other date functions of Spark in the future, do I need to fix 
this problem first? For example, 
https://issues.apache.org/jira/browse/CALCITE-6527

Best wishes,
Cancai Cai

> 2024年8月27日 05:11,Norman Jordan  写道:
> 
> Re https://issues.apache.org/jira/browse/CALCITE-6544. Is it possible that 
> BigQuery’s DATE_ADD can return BC dates because BigQuery’s DATE data type has 
> a different range of allowable values than Calcite’s? In other words, it’s 
> not a difference with the DATE_ADD function.