A couple of weeks ago Vladimir pointed out a new research project from UW that 
checks whether two SQL statements are equivalent. It is called Cosette[1] and 
there is a nice blog post in Medium about it[2]. I have conversed with the 
researchers (Alvin and Shumo) back-channel, but let’s have a discussion whether 
there are any synergies between Apache Calcite and Cosette.

The researchers are on Bcc. I suggest that they join dev@calcite to reply and 
see other messages.

One obvious idea is to use Cosette to audit Calcite’s query transformation 
rules. Each rule is supposed to preserve semantics but (until Cosette) we had 
to trust the author of the rule. We could convert the before and after 
relational expressions to SQL, and then ask Cosette whether those are 
equivalent. We could enable this check in Calcite’s test suite, during which 
many thousands of rules are fired.

I have logged https://issues.apache.org/jira/browse/CALCITE-1977 
<https://issues.apache.org/jira/browse/CALCITE-1977> for this.

A few rules might use other information besides the input relational 
expression, such as predicates that are known to hold or column combinations 
that are known to be unique. But let’s see what happens.

This is a very loose integration of Cosette / Calcite, but we can make closer 
integrations (e.g. within the same JVM, even at runtime) as we discover 
synergies. After all, optimization and theorem-proving are related endeavors. 

Another area that would be useful would be to devise test data. Let’s suppose 
that I am testing Calcite’s implementation of

 select *
 from emp
 where sal = 100
 or sal not in (select age + 100 from emp where deptno > 20)

against a reference implementation (say Postgres) that I know gives the correct 
answer. Even if Calcite has a bug, it may not show up in the default “emp” data 
set. There are a lot of combinations of sal null / not-null, deptno null / 
not-null, and emp empty non / non-empty, and so forth. I presume that Cosette 
is not a reference implementation of SQL, but could Cosette perhaps help 
suggest data sets that would highlight incompatibilities?

There might be applications in materialized views. A query Q can use a 
materialized view V if V covers Q. In other words if Q == R(V) where R is some 
sequence of relational operators. Given Q and V, Cosette could perhaps analyze 
and either return R (success) or return that V does not cover Q (failure).  

Any other ideas?

Julian

[1] http://cosette.cs.washington.edu/ <http://cosette.cs.washington.edu/>

[2] https://medium.com/@uwdb/introducing-cosette-527898504bd6 
<https://medium.com/@uwdb/introducing-cosette-527898504bd6> 

Reply via email to