In an open source project the core is "de facto”. People decide what is important to them, add tests, fix the code to match the tests.
The core is well-defined - it is the test suite. If the core doesn’t match your needs, contributions are welcome. Julian > On Dec 2, 2024, at 11:52 AM, Mihai Budiu <[email protected]> wrote: > > The difficulty will be in defining the "core". > > If you want 100% fidelity the goals are clear. If you want 99% fidelity you > have to define what is in the 1% left over. That's why I asked Yanjing Wang > to setup a goal that needs to be achieved. > > Even if you leave out all functions, just the type systems mismatches and the > behavior of casts make this a difficult job. Today the static and dynamic > behavior of casts seems to be to a large degree built-in in Calcite, but to > make that flexible it would have to move somehow into the type system, so > each dialect can customize the way casts work. We have seen this for example > in the handling of casts to/from BINARY: > https://issues.apache.org/jira/browse/CALCITE-6210 > > I suspect the "high severity" thing happens often because this is the JIRA > default, and people don't bother to change it. > > Mihai > > ________________________________ > From: Julian Hyde <[email protected]> > Sent: Monday, December 2, 2024 11:40 AM > To: [email protected] <[email protected]> > Subject: Re: [DISCUSS] Calcite as SQL translator, and dialect testing > > I agree with Mihai that it’s not practical to translate dialect-to-dialect > with 100% fidelity. You get defeated by the edge cases — values of DECIMAL > that are valid on one DB but not another, what happens when the ‘offset’ > argument of the SUBSTRING function is negative. > > But the core of the language is a lot more valuable than the edges. We can do > a good job at the core. > > Let’s use our tests to define the core, keep it working, and continually > extend it. > > Tests are crucial, and should be the heart of this initiative. If people want > Calcite to be a better SQL translator, they should invest in improving the > tests. (Not just a stream of band-aid PRs.) > > You may notice that I get exasperated when someone logs a bug at the > boundaries and claims that it is high severity. For example, wants us to > handle dates earlier than 1 BCE. I know why this happens — your customer has > logged a bug because their compliance suite has a date that is 2 BCE. These > cases exist because of QA engineers testing up to the boundaries — and > logging bugs that are handled by developers of another product and *their* QA > engineers. But let’s keep some perspective — these cases are filed, and > fixed, only to make QA engineers happy. Let’s focus on the core. > >>> My claim was that it's impossible to even implement a translation between >>> any two dialects such that the resulting program provides the exact same >>> results as the source program. > > Not impossible. Google acquired a company, CompilerWorks, that models the > type system and built-in functions of various SQL dialects and maps between > them. It is a lot of work, and requires a lot of testing. We cannot get to > that place with the limited resources of an open source project, but we can > asymptotically approach it if we are clever. > > Julian > > > >> On Dec 2, 2024, at 10:05 AM, Mihai Budiu <[email protected]> wrote: >> >> My claim was that it's impossible to even implement a translation between >> any two dialects such that the resulting program provides the exact same >> results as the source program. >> >> Clearly, I don't know all SQL dialects, but I think this is true for all >> major dialects. >> >> Mihai >> >> ________________________________ >> From: Yanjing Wang <[email protected]> >> Sent: Sunday, December 1, 2024 7:10 PM >> To: [email protected] <[email protected]> >> Subject: Re: [DISCUSS] Calcite as SQL translator, and dialect testing >> >> Thank you, Mihai, for your detailed insights. >> >> Implementing translations for all dialects indeed requires a significant >> amount of work and can be quite challenging to complete. Would it be >> possible for us to create an example that demonstrates how to implement >> translation between two specific dialects and materialized view >> substitution from scratch? This example could potentially be enhanced with >> contributions from various companies and be friendly to newbies. >> >> >> Thank you, Walaa, for your recommendation. I understand that Coral uses >> ANTLR4 for parsing and translating from ASTNode to Calcite SqlNode. Do you >> find this approach easier compared to defining a Calcite FTL file for >> unsupported syntax? >> >> Walaa Eldin Moustafa <[email protected]> 于2024年11月28日周四 01:27写道: >> >>> You may consider looking into Coral [1] which is based on Calcite, but >>> extends it with the translation requirements in mind, and as first class >>> problems. >>> >>> [1] https://github.com/linkedin/coral >>> >>> Thanks, >>> Walaa. >>> >>> >>> On Wed, Nov 27, 2024 at 9:20 AM Mihai Budiu <[email protected]> wrote: >>> >>>> I think that the task of translation between SQL dialects is hopeless. >>>> Consider just the type DECIMAL: >>>> >>>> >>>> * >>>> Each dialect supports a different range of DECIMAL types, specified by >>>> scale and precision. >>>> * >>>> Some dialects support negative scales, and some support scales > >>> precision. >>>> * >>>> Some dialects, like Postgres have infinite values in DECIMAL. This means >>>> that DECIMAL arithmetic can never cause a runtime error in Postgres, >>> unlike >>>> all other dialects. >>>> * >>>> The rounding rules for converting between DECIMAL and other types differ >>>> between dialects. >>>> * >>>> The typing rules for DECIMAL arithmetic differ between dialects: what is >>>> the type of a/b when a and b are DECIMALs? >>>> * >>>> The implicit typing of literals is different between dialects: is 1.0e0 a >>>> DECIMAL or some floating point value? >>>> >>>> The conclusion is that a SQL dialect can express many programs that >>> cannot >>>> be expressed at all in other SQL dialects. What is a translator supposed >>> to >>>> do with such programs? >>>> >>>> These are a set of problems for just one type, but when you consider the >>>> full set of types supported, the problem is exponentially larger. >>>> >>>> Just look at the differences between functions in dialects: >>>> https://calcite.apache.org/docs/reference.html Calcite has 4 versions of >>>> CONCAT_WS, with subtle semantic differences. >>>> >>>> Are you aware of any tool, anywhere that translates between 2 fixed SQL >>>> dialects while preserving the program's exact behavior in all cases? >>>> Building a tool for all possible dialects is again exponentially harder. >>>> >>>> Since we have established that exact semantics-preserving transformation >>>> of SQL programs between dialects is impossible, you need to set up a goal >>>> for the translation effort. What do you really hope to achieve? Then we >>> can >>>> discuss about how it can be done. >>>> >>>> Mihai >>>> >>>> ________________________________ >>>> From: Yanjing Wang <[email protected]> >>>> Sent: Wednesday, November 27, 2024 1:02 AM >>>> To: [email protected] <[email protected]> >>>> Subject: Re: [DISCUSS] Calcite as SQL translator, and dialect testing >>>> >>>> Hello, this discussion has been ongoing for a week. Let's move it >>> forward. >>>> Does anyone else have any suggestions? >>>> >>>> Yanjing Wang <[email protected]> 于2024年11月19日周二 14:47写道: >>>> >>>>> 1. RelToSqlConverterTest >>>>> the class name implies tests conversion from RelNode to SQL, but now >>> its >>>>> RelNode comes from different dialects with target sql. it is difficult >>>> for >>>>> me to understand the test case >>>>> >>>>> @Test void testNullCollation() { >>>>> final String query = "select * from \"product\" order by >>>> \"brand_name\""; >>>>> final String expected = "SELECT *\n" >>>>> + "FROM \"foodmart\".\"product\"\n" >>>>> + "ORDER BY \"brand_name\""; >>>>> final String sparkExpected = "SELECT *\n" >>>>> + "FROM `foodmart`.`product`\n" >>>>> + "ORDER BY `brand_name` NULLS LAST"; >>>>> sql(query) >>>>> .withPresto().ok(expected) >>>>> .withSpark().ok(sparkExpected); >>>>> } >>>>> >>>>> >>>>> Why does the spark sql have 'NULLS LAST' in the end? the information is >>>>> missing if we don't add source rel or source dialect. >>>>> >>>>> 2. Dialect-to-dialect translation >>>>> I think it's necessary, dialect translation and materialized view >>>>> substitution are common in big data domain, it would be beneficial to >>>> make >>>>> Calcite more user-friendly for these scenarios. >>>>> Could we create end-to-end test cases that start with the source SQL of >>>>> one dialect and end with the target SQL of another (or the same) >>> dialect? >>>>> We could also include user-defined materialized views in the process >>> and >>>>> perform result comparison. >>>>> >>>>> Julian Hyde <[email protected]> 于2024年11月19日周二 07:21写道: >>>>> >>>>>> A recent case, https://issues.apache.org/jira/browse/CALCITE-6693, >>> "Add >>>>>> Source SQL Dialect to RelToSqlConverterTest”, implies that people are >>>> using >>>>>> Calcite to translate SQL from dialect to another. The test wanted to >>>> test >>>>>> translating a SQL string from Presto to Redshift. I pushed back on >>> that >>>>>> case (and its PR) because that test is for translating RelNode to a >>> SQL >>>>>> dialect, not about handling source dialects. >>>>>> >>>>>> Dialect-to-dialect translation is undoubtedly something that people do >>>>>> with Calcite. I think we should recognize that fact, and document how >>>>>> someone can use Calcite as a translator. When we have documented it, >>> we >>>> can >>>>>> also add some tests. >>>>>> >>>>>> I am also worried about our dialect tests in general. The surface area >>>> to >>>>>> be tested is huge, and the tests are added haphazardly, so while many >>>> cases >>>>>> are tested there is a much greater set of cases that are not tested. >>>>>> Consider, for example, how testSelectQueryWithGroupByEmpty [1] tests >>>>>> against MySQL, Presto, StarRocks but not against BigQuery, Snowflake >>> or >>>>>> Postgres. If we want our SQL dialect support to be high quality, we >>>> have to >>>>>> find a way to improve the coverage of our tests. I logged >>>>>> https://issues.apache.org/jira/browse/CALCITE-5529 with some ideas >>> but >>>> I >>>>>> need help implement it. >>>>>> >>>>>> Julian >>>>>> >>>>>> [1] >>>>>> >>>> >>> https://github.com/apache/calcite/blob/f2ec11fe7e23ecf2db903bc02c40609242993aad/core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java#L577 >>>>>> >>>>>> >>>>>> >>>> >>> >
