Re: TypeCoercion and OperandTypeCheckers

Mihai Budiu Wed, 23 Oct 2024 12:18:46 -0700

What TypeCoercion does is to convert implicit casts into explicit casts in the 
program representation. All programming languages have rules about type 
coercion. Here is the Java spec: 
https://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html

In other languages, like C++ or Java, functions are not special. A function has 
a definite signature (a type for each parameter and for its result), and 
whenever you call it with arguments that have different types, the compiler 
will use the coercion rules of the language. The situation is a bit more 
complicated when functions can be overloaded (i.e., there exist multiple 
functions with the same name but different signatures), because the compiler 
first has to apply overload resolution rules to figure out which one of the 
overloads is being used 
https://docs.oracle.com/javase/specs/jls/se7/html/jls-8.html#jls-8.4.9 and 
https://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12, but 
after that the type coercion rules are applied.

Unfortunately, no one wrote a spec for the Calcite type checker - the code is 
the spec, and the code keeps evolving. In the absence of a spec we can argue 
endlessly about the right approach.

SQL is an unusual language, in that it supports many operations that have 
arguments of different types. For example, addition can take dates, intervals, 
numeric values, and even strings. These are in fact just overloaded functions 
that are written as operators.

Some SQL functions are truly overloaded, but other functions aren't, and should 
require coercions. Some functions are overloaded just because they behave 
differently in different SQL dialects. Unfortunately the TypeCoercion code in 
Calcite doesn't know anything about functions. I think that the current 
implementation of function type checking using the OperandTypeCheckers is 
broken, because it does not convert implicit casts into explicit casts where 
necessary. There have been some other threads about this on the mailing list.

The situation becomes even more complicated when you have generic types, like 
ARRAY or MAP in Calcite. Objects such as the empty array literal do not provide 
enough information to derive a type bottom-up (where the type of an expression 
is inferred from the type of its subexpressions); the type can be any T ARRAY, 
for any type T; "T" is a type variable. In such cases an algorithm like type 
unification must be used; what type unification does is to derive values for 
the type variables involved. If you pass the empty array literal to a function 
that expects INT ARRAY, the unification will infer the fact that T is INT. But 
Calcite does not use type unification. Unification is an important algorithm 
because it propagates type information not only bottom-up, but also top-down. 
In fact, unification solves a system of equations. SQL used to be a simple 
language, in which bottom-up inference was sufficient, but with the addition of 
generic types and overloaded functions, it is no longer such a simple language.

In Calcite TypeCoercion is applied to some of the SQL language built-in 
operators, such as comparisons.
But for functions a completely different mechanism is used, which runs after 
TypeCoercion, and never inserts explicit casts.

To make things even more confusing, turns out that TypeCoercion was broken even 
for comparisons until recently (https://github.com/apache/calcite/pull/3998). I 
actually suspect that there are still cases which are not handled, even for 
comparisons, or for other built-in operators. I plan to revisit this.

And, if this is not complicated enough, Calcite is supposed to be a generic 
framework which can emulate different SQL dialects. But I think different 
dialects have different type coercion rules. Ideally the type coercion rules 
would be part of the TypeSystem, and would depend on dialect.

I am not sure I am answering your question directly.

Mihai

________________________________
From: Abbas Gadhia <[email protected]>
Sent: Wednesday, October 23, 2024 6:58 AM
To: [email protected] <[email protected]>
Subject: Re: TypeCoercion and OperandTypeCheckers

Sorry if i was brief earlier.
In the following statement,
`update t1 set bigint_col = int_col`the type coercion rules in the validator 
kick in and decide that a cast is not needed between `bigint_col` and 
`int_col`. This is because of the type precedence rules of int columns and 
bigint columns (via AbstractTypeCoercion#coerceColumnType and 
AbstractTypeCoercion#needToCast)

However, in a statement like`select * from t1 where bigint_col = int_col`
a cast is applied, since the EQUALS operator has a OperandTypeChecker whose 
Consistency is LEAST_RESTRICTIVE, which mandates that a cast be applied to the 
least restrictive type. This happens during the SqlToRelConversion phase.

What was coming to my mind were 2 things1. Why type coercion is handled in 2 
places.
2. Just like in the `select` statement, the code consulted the EQUALS operator 
to find out whether a cast was needed or not, should there not be a concept of 
a "ASSIGNMENT" operator for INSERT/UPDATE/MERGE that is consulted about the 
type coercion rules rather than a piece of code that sits separately 
(TypeCoercion)

    On Wednesday, 23 October, 2024 at 06:16:19 pm IST, Cancai Cai 
<[email protected]> wrote:

 +1, Agreed. We need more details.

Best wishes,
Cancai Cai

> 2024年10月23日 20:33，Xiong Duan <[email protected]> 写道：
>
> Hi Abbas, I don't understand what you mean. Can you use an SQL
> statement to express what kind of output you want and What the
> Calcite's output is? Then we will discuss the details.

Re: TypeCoercion and OperandTypeCheckers

Reply via email to