I have already contributed an implementation of Sql Logic Test for Calcite, it is part of the "plus" project of Calcite.
SLT is completely complementary with the problem described here. SLT tests only the very core of the SQL language, which should be common to all implementations. It only uses 3 data types (integer, string, decimal), and it never calls any non-standard functions. If you configure Calcite in a specific way, it can pass all SLT tests. But some optimization passes are unsound, and some of these are used in the default configuration (when you use Calcite through a JDBC connection). That's the reason why currently the SLT tests are not being regularly run. In our project we took a very similar approach to the one described by Julian: when we adopt a SQL function from a dialect (we attempt to support functions from many dialects), we test it thoroughly on the original database and write all the same tests using our Calcite implementation. If something doesn't match, I file an issue with Calcite. We test * compile-time evaluation (for constant expressions) * run-time evaluation (including of the same expressions) * tests that should fail at compilation time (e.g. type errors) * tests that should fail at runtime (e.g., division by zero) We try to take the tests from the official test suites, e.g.: Postgres regress: https://github.com/postgres/postgres/tree/bcd5b4bcbefc8b99be3678cf9c6c990a0c28f68e/src/test/regress/expected We have written parsers that read the expected output from the official tests so that we can validate the results (for positive tests) while reusing as much as possible the original tests without introducing bugs in the translation of the tests. As Yiwen Wu says, in general it is impossible to execute a function for a specific dialect on a different runtime and get the same result. But this does not mean we shouldn't implement the functions precisely. In our runtime we reimplement all the functions in Rust, and we can support functions from different runtimes with no problem. But we still need the Java implementation to match the Rust implementation and with the original database implementation. Quidem may be the best way to do it, but it needs negative tests too. Mihai ________________________________ From: Stamatis Zampetakis <zabe...@gmail.com> Sent: Monday, February 26, 2024 5:31 AM To: dev@calcite.apache.org <dev@calcite.apache.org> Subject: Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL systems The Quidem approach definitely makes sense. Alternatively, since we are focusing on the behavior of functions with certain inputs it may be simpler to go with a Java parameterized test backed by a CSV/JSON file ([1] outlines the general idea). As others mentioned, using the org.testcontainers framework can significantly alleviate the burden of setting up and managing different DBMS. Best, Stamatis [1] https://github.com/apache/calcite/pull/3704 On Mon, Feb 26, 2024 at 4:01 AM Yiwen Wu <yiwenwu1...@gmail.com> wrote: > > I think it is difficult and impossible to make calcite function results > completely consistent with other SQL systems. In practice, when > calculations are pushed down to different data source adapters for > execution, there is no guarantee that the final results will be completely > consistent in calcite. > > I think, for the built-in functions defined in `SqlStdOperatorTable`, we > can choose to follow one of the definitive SQL systems, such as Oracle. For > the engine-related extension functions in `SqlLibraryOperators`, we can > follow the behavior related to the specific engine. If an already defined > function behaves inconsistently with the new engine, we can add a new > Function to the definition, such as the `CONCAT` function. > > At the same time, I agree it is a great suggestion to add execution tests > of different engines in Quidem, which is very effective in verifying the > engine execution behavior. > > Cancai Cai <caic68...@gmail.com> 于2024年2月26日周一 09:59写道: > > > Thank you very much to the calcite community for raising these questions. > > This is what I have been doubting. I am very sorry that this doubt has been > > discussed for so long. > > > > Maybe we also need to consider another issue, that is, the database version > > issue. Versions like mysql and postgres are very stable, but components > > like spark still seem to have function bugs. exists, then how should we > > consider version issues? > > > > I don't know what I can do. Recently I am sorting out some documents about > > the use of some functions of mysql and postgres in calcite. I don't know if > > this is helpful. > > > > Best wishes, > > Cancai Cai > >