Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL systems

Mihai Budiu Mon, 26 Feb 2024 10:14:03 -0800

I have already contributed an implementation of Sql Logic Test for Calcite, it 
is part of the "plus" project of Calcite.

SLT is completely complementary with the problem described here. SLT tests only 
the very core of the SQL language, which should be common to all 
implementations. It only uses 3 data types (integer, string, decimal), and it 
never calls any non-standard functions.

If you configure Calcite in a specific way, it can pass all SLT tests. But some 
optimization passes are unsound, and some of these are used in the default 
configuration (when you use Calcite through a JDBC connection). That's the 
reason why currently the SLT tests are not being regularly run.

In our project we took a very similar approach to the one described by Julian: 
when we adopt a SQL function from a dialect (we attempt to support functions 
from many dialects), we test it thoroughly on the original database and write 
all the same tests using our Calcite implementation. If something doesn't 
match, I file an issue with Calcite.

We test

  *
compile-time evaluation (for constant expressions)
  *
run-time evaluation (including of the same expressions)
  *
tests that should fail at compilation time (e.g. type errors)
  *
tests that should fail at runtime (e.g., division by zero)

We try to take the tests from the official test suites, e.g.: Postgres regress: 
https://github.com/postgres/postgres/tree/bcd5b4bcbefc8b99be3678cf9c6c990a0c28f68e/src/test/regress/expected
We have written parsers that read the expected output from the official tests 
so that we can validate the results (for positive tests) while reusing as much 
as possible the original tests without introducing bugs in the translation of 
the tests.

As Yiwen Wu says, in general it is impossible to execute a function for a 
specific dialect on a different runtime and get the same result. But this does 
not mean we shouldn't implement the functions precisely. In our runtime we 
reimplement all the functions in Rust, and we can support functions from 
different runtimes with no problem. But we still need the Java implementation 
to match the Rust implementation and with the original database implementation.

Quidem may be the best way to do it, but it needs negative tests too.

Mihai

________________________________
From: Stamatis Zampetakis <[email protected]>
Sent: Monday, February 26, 2024 5:31 AM
To: [email protected] <[email protected]>
Subject: Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL 
systems

The Quidem approach definitely makes sense.

Alternatively, since we are focusing on the behavior of functions with
certain inputs it may be simpler to go with a Java parameterized test
backed by a CSV/JSON file ([1] outlines the general idea).
As others mentioned, using the org.testcontainers framework can
significantly alleviate the burden of setting up and managing
different DBMS.

Best,
Stamatis

[1] https://github.com/apache/calcite/pull/3704

On Mon, Feb 26, 2024 at 4:01 AM Yiwen Wu <[email protected]> wrote:
>
> I think it is difficult and impossible to make calcite function results
> completely consistent with other SQL systems. In practice, when
> calculations are pushed down to different data source adapters for
> execution, there is no guarantee that the final results will be completely
> consistent in calcite.
>
> I think, for the built-in functions defined in `SqlStdOperatorTable`, we
> can choose to follow one of the definitive SQL systems, such as Oracle. For
> the engine-related extension functions in `SqlLibraryOperators`, we can
> follow the behavior related to the specific engine. If an already defined
> function behaves inconsistently with the new engine, we can add a new
> Function to the definition, such as the `CONCAT` function.
>
> At the same time, I agree it is a great suggestion to add execution tests
> of different engines in Quidem, which is very effective in verifying the
> engine execution behavior.
>
> Cancai Cai <[email protected]> 于2024年2月26日周一 09:59写道：
>
> > Thank you very much to the calcite community for raising these questions.
> > This is what I have been doubting. I am very sorry that this doubt has been
> > discussed for so long.
> >
> > Maybe we also need to consider another issue, that is, the database version
> > issue. Versions like mysql and postgres are very stable, but components
> > like spark still seem to have function bugs. exists, then how should we
> > consider version issues?
> >
> > I don't know what I can do. Recently I am sorting out some documents about
> > the use of some functions of mysql and postgres in calcite. I don't know if
> > this is helpful.
> >
> > Best wishes,
> > Cancai Cai
> >

Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL systems

Reply via email to