In an open source project the core is "de facto”. People decide what is 
important to them, add tests, fix the code to match the tests.

The core is well-defined - it is the test suite. If the core doesn’t match your 
needs, contributions are welcome.

Julian



> On Dec 2, 2024, at 11:52 AM, Mihai Budiu <[email protected]> wrote:
> 
> The difficulty will be in defining the "core".
> 
> If you want 100% fidelity the goals are clear. If you want 99% fidelity you 
> have to define what is in the 1% left over. That's why I asked Yanjing Wang 
> to setup a goal that needs to be achieved.
> 
> Even if you leave out all functions, just the type systems mismatches and the 
> behavior of casts make this a difficult job. Today the static and dynamic 
> behavior of casts seems to be to a large degree built-in in Calcite, but to 
> make that flexible it would have to move somehow into the type system, so 
> each dialect can customize the way casts work. We have seen this for example 
> in the handling of casts to/from BINARY: 
> https://issues.apache.org/jira/browse/CALCITE-6210
> 
> I suspect the "high severity" thing happens often because this is the JIRA 
> default, and people don't bother to change it.
> 
> Mihai
> 
> ________________________________
> From: Julian Hyde <[email protected]>
> Sent: Monday, December 2, 2024 11:40 AM
> To: [email protected] <[email protected]>
> Subject: Re: [DISCUSS] Calcite as SQL translator, and dialect testing
> 
> I agree with Mihai that it’s not practical to translate dialect-to-dialect 
> with 100% fidelity. You get defeated by the edge cases — values of DECIMAL 
> that are valid on one DB but not another, what happens when the ‘offset’ 
> argument of the SUBSTRING function is negative.
> 
> But the core of the language is a lot more valuable than the edges. We can do 
> a good job at the core.
> 
> Let’s use our tests to define the core, keep it working, and continually 
> extend it.
> 
> Tests are crucial, and should be the heart of this initiative. If people want 
> Calcite to be a better SQL translator, they should invest in improving the 
> tests. (Not just a stream of band-aid PRs.)
> 
> You may notice that I get exasperated when someone logs a bug at the 
> boundaries and claims that it is high severity. For example, wants us to 
> handle dates earlier than 1 BCE. I know why this happens — your customer has 
> logged a bug because their compliance suite has a date that is 2 BCE. These 
> cases exist because of QA engineers testing up to the boundaries — and 
> logging bugs that are handled by developers of another product and *their* QA 
> engineers. But let’s keep some perspective — these cases are filed, and 
> fixed, only to make QA engineers happy. Let’s focus on the core.
> 
>>> My claim was that it's impossible to even implement a translation between 
>>> any two dialects such that the resulting program provides the exact same 
>>> results as the source program.
> 
> Not impossible. Google acquired a company, CompilerWorks, that models the 
> type system and built-in functions of various SQL dialects and maps between 
> them. It is a lot of work, and requires a lot of testing. We cannot get to 
> that place with the limited resources of an open source project, but we can 
> asymptotically approach it if we are clever.
> 
> Julian
> 
> 
> 
>> On Dec 2, 2024, at 10:05 AM, Mihai Budiu <[email protected]> wrote:
>> 
>> My claim was that it's impossible to even implement a translation between 
>> any two dialects such that the resulting program provides the exact same 
>> results as the source program.
>> 
>> Clearly, I don't know all SQL dialects, but I think this is true for all 
>> major dialects.
>> 
>> Mihai
>> 
>> ________________________________
>> From: Yanjing Wang <[email protected]>
>> Sent: Sunday, December 1, 2024 7:10 PM
>> To: [email protected] <[email protected]>
>> Subject: Re: [DISCUSS] Calcite as SQL translator, and dialect testing
>> 
>> Thank you, Mihai, for your detailed insights.
>> 
>> Implementing translations for all dialects indeed requires a significant
>> amount of work and can be quite challenging to complete. Would it be
>> possible for us to create an example that demonstrates how to implement
>> translation between two specific dialects and materialized view
>> substitution from scratch? This example could potentially be enhanced with
>> contributions from various companies and be friendly to newbies.
>> 
>> 
>> Thank you, Walaa, for your recommendation. I understand that Coral uses
>> ANTLR4 for parsing and translating from ASTNode to Calcite SqlNode. Do you
>> find this approach easier compared to defining a Calcite FTL file for
>> unsupported syntax?
>> 
>> Walaa Eldin Moustafa <[email protected]> 于2024年11月28日周四 01:27写道:
>> 
>>> You may consider looking into Coral [1] which is based on Calcite, but
>>> extends it with the translation requirements in mind, and as first class
>>> problems.
>>> 
>>> [1] https://github.com/linkedin/coral
>>> 
>>> Thanks,
>>> Walaa.
>>> 
>>> 
>>> On Wed, Nov 27, 2024 at 9:20 AM Mihai Budiu <[email protected]> wrote:
>>> 
>>>> I think that the task of translation between SQL dialects is hopeless.
>>>> Consider just the type DECIMAL:
>>>> 
>>>> 
>>>> *
>>>> Each dialect supports a different range of DECIMAL types, specified by
>>>> scale and precision.
>>>> *
>>>> Some dialects support negative scales, and some support scales >
>>> precision.
>>>> *
>>>> Some dialects, like Postgres have infinite values in DECIMAL. This means
>>>> that DECIMAL arithmetic can never cause a runtime error in Postgres,
>>> unlike
>>>> all other dialects.
>>>> *
>>>> The rounding rules for converting between DECIMAL and other types differ
>>>> between dialects.
>>>> *
>>>> The typing rules for DECIMAL arithmetic differ between dialects: what is
>>>> the type of a/b when a and b are DECIMALs?
>>>> *
>>>> The implicit typing of literals is different between dialects: is 1.0e0 a
>>>> DECIMAL or some floating point value?
>>>> 
>>>> The conclusion is that a SQL dialect can express many programs that
>>> cannot
>>>> be expressed at all in other SQL dialects. What is a translator supposed
>>> to
>>>> do with such programs?
>>>> 
>>>> These are a set of problems for just one type, but when you consider the
>>>> full set of types supported, the problem is exponentially larger.
>>>> 
>>>> Just look at the differences between functions in dialects:
>>>> https://calcite.apache.org/docs/reference.html Calcite has 4 versions of
>>>> CONCAT_WS, with subtle semantic differences.
>>>> 
>>>> Are you aware of any tool, anywhere that translates between 2 fixed SQL
>>>> dialects while preserving the program's exact behavior in all cases?
>>>> Building a tool for all possible dialects is again exponentially harder.
>>>> 
>>>> Since we have established that exact semantics-preserving transformation
>>>> of SQL programs between dialects is impossible, you need to set up a goal
>>>> for the translation effort. What do you really hope to achieve? Then we
>>> can
>>>> discuss about how it can be done.
>>>> 
>>>> Mihai
>>>> 
>>>> ________________________________
>>>> From: Yanjing Wang <[email protected]>
>>>> Sent: Wednesday, November 27, 2024 1:02 AM
>>>> To: [email protected] <[email protected]>
>>>> Subject: Re: [DISCUSS] Calcite as SQL translator, and dialect testing
>>>> 
>>>> Hello, this discussion has been ongoing for a week. Let's move it
>>> forward.
>>>> Does anyone else have any suggestions?
>>>> 
>>>> Yanjing Wang <[email protected]> 于2024年11月19日周二 14:47写道:
>>>> 
>>>>> 1. RelToSqlConverterTest
>>>>> the class name implies tests conversion from RelNode to SQL, but now
>>> its
>>>>> RelNode comes from different dialects with target sql. it is difficult
>>>> for
>>>>> me to understand the test case
>>>>> 
>>>>> @Test void testNullCollation() {
>>>>> final String query = "select * from \"product\" order by
>>>> \"brand_name\"";
>>>>> final String expected = "SELECT *\n"
>>>>>     + "FROM \"foodmart\".\"product\"\n"
>>>>>     + "ORDER BY \"brand_name\"";
>>>>> final String sparkExpected = "SELECT *\n"
>>>>>     + "FROM `foodmart`.`product`\n"
>>>>>     + "ORDER BY `brand_name` NULLS LAST";
>>>>> sql(query)
>>>>>     .withPresto().ok(expected)
>>>>>     .withSpark().ok(sparkExpected);
>>>>> }
>>>>> 
>>>>> 
>>>>> Why does the spark sql have 'NULLS LAST' in the end? the information is
>>>>> missing if we don't add source rel or source dialect.
>>>>> 
>>>>> 2. Dialect-to-dialect translation
>>>>> I think it's necessary, dialect translation and materialized view
>>>>> substitution are common in big data domain, it would be beneficial to
>>>> make
>>>>> Calcite more user-friendly for these scenarios.
>>>>> Could we create end-to-end test cases that start with the source SQL of
>>>>> one dialect and end with the target SQL of another (or the same)
>>> dialect?
>>>>> We could also include user-defined materialized views in the process
>>> and
>>>>> perform result comparison.
>>>>> 
>>>>> Julian Hyde <[email protected]> 于2024年11月19日周二 07:21写道:
>>>>> 
>>>>>> A recent case, https://issues.apache.org/jira/browse/CALCITE-6693,
>>> "Add
>>>>>> Source SQL Dialect to RelToSqlConverterTest”, implies that people are
>>>> using
>>>>>> Calcite to translate SQL from dialect to another. The test wanted to
>>>> test
>>>>>> translating a SQL string from Presto to Redshift. I pushed back on
>>> that
>>>>>> case (and its PR) because that test is for translating RelNode to a
>>> SQL
>>>>>> dialect, not about handling source dialects.
>>>>>> 
>>>>>> Dialect-to-dialect translation is undoubtedly something that people do
>>>>>> with Calcite. I think we should recognize that fact, and document how
>>>>>> someone can use Calcite as a translator. When we have documented it,
>>> we
>>>> can
>>>>>> also add some tests.
>>>>>> 
>>>>>> I am also worried about our dialect tests in general. The surface area
>>>> to
>>>>>> be tested is huge, and the tests are added haphazardly, so while many
>>>> cases
>>>>>> are tested there is a much greater set of cases that are not tested.
>>>>>> Consider, for example, how testSelectQueryWithGroupByEmpty [1] tests
>>>>>> against MySQL, Presto, StarRocks but not against BigQuery, Snowflake
>>> or
>>>>>> Postgres. If we want our SQL dialect support to be high quality, we
>>>> have to
>>>>>> find a way to improve the coverage of our tests. I logged
>>>>>> https://issues.apache.org/jira/browse/CALCITE-5529 with some ideas
>>> but
>>>> I
>>>>>> need help implement it.
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>> 
>>> https://github.com/apache/calcite/blob/f2ec11fe7e23ecf2db903bc02c40609242993aad/core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java#L577
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>> 
> 

Reply via email to