What TypeCoercion does is to convert implicit casts into explicit casts in the program representation. All programming languages have rules about type coercion. Here is the Java spec: https://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html
In other languages, like C++ or Java, functions are not special. A function has a definite signature (a type for each parameter and for its result), and whenever you call it with arguments that have different types, the compiler will use the coercion rules of the language. The situation is a bit more complicated when functions can be overloaded (i.e., there exist multiple functions with the same name but different signatures), because the compiler first has to apply overload resolution rules to figure out which one of the overloads is being used https://docs.oracle.com/javase/specs/jls/se7/html/jls-8.html#jls-8.4.9 and https://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12, but after that the type coercion rules are applied. Unfortunately, no one wrote a spec for the Calcite type checker - the code is the spec, and the code keeps evolving. In the absence of a spec we can argue endlessly about the right approach. SQL is an unusual language, in that it supports many operations that have arguments of different types. For example, addition can take dates, intervals, numeric values, and even strings. These are in fact just overloaded functions that are written as operators. Some SQL functions are truly overloaded, but other functions aren't, and should require coercions. Some functions are overloaded just because they behave differently in different SQL dialects. Unfortunately the TypeCoercion code in Calcite doesn't know anything about functions. I think that the current implementation of function type checking using the OperandTypeCheckers is broken, because it does not convert implicit casts into explicit casts where necessary. There have been some other threads about this on the mailing list. The situation becomes even more complicated when you have generic types, like ARRAY or MAP in Calcite. Objects such as the empty array literal do not provide enough information to derive a type bottom-up (where the type of an expression is inferred from the type of its subexpressions); the type can be any T ARRAY, for any type T; "T" is a type variable. In such cases an algorithm like type unification must be used; what type unification does is to derive values for the type variables involved. If you pass the empty array literal to a function that expects INT ARRAY, the unification will infer the fact that T is INT. But Calcite does not use type unification. Unification is an important algorithm because it propagates type information not only bottom-up, but also top-down. In fact, unification solves a system of equations. SQL used to be a simple language, in which bottom-up inference was sufficient, but with the addition of generic types and overloaded functions, it is no longer such a simple language. In Calcite TypeCoercion is applied to some of the SQL language built-in operators, such as comparisons. But for functions a completely different mechanism is used, which runs after TypeCoercion, and never inserts explicit casts. To make things even more confusing, turns out that TypeCoercion was broken even for comparisons until recently (https://github.com/apache/calcite/pull/3998). I actually suspect that there are still cases which are not handled, even for comparisons, or for other built-in operators. I plan to revisit this. And, if this is not complicated enough, Calcite is supposed to be a generic framework which can emulate different SQL dialects. But I think different dialects have different type coercion rules. Ideally the type coercion rules would be part of the TypeSystem, and would depend on dialect. I am not sure I am answering your question directly. Mihai ________________________________ From: Abbas Gadhia <[email protected]> Sent: Wednesday, October 23, 2024 6:58 AM To: [email protected] <[email protected]> Subject: Re: TypeCoercion and OperandTypeCheckers Sorry if i was brief earlier. In the following statement, `update t1 set bigint_col = int_col`the type coercion rules in the validator kick in and decide that a cast is not needed between `bigint_col` and `int_col`. This is because of the type precedence rules of int columns and bigint columns (via AbstractTypeCoercion#coerceColumnType and AbstractTypeCoercion#needToCast) However, in a statement like`select * from t1 where bigint_col = int_col` a cast is applied, since the EQUALS operator has a OperandTypeChecker whose Consistency is LEAST_RESTRICTIVE, which mandates that a cast be applied to the least restrictive type. This happens during the SqlToRelConversion phase. What was coming to my mind were 2 things1. Why type coercion is handled in 2 places. 2. Just like in the `select` statement, the code consulted the EQUALS operator to find out whether a cast was needed or not, should there not be a concept of a "ASSIGNMENT" operator for INSERT/UPDATE/MERGE that is consulted about the type coercion rules rather than a piece of code that sits separately (TypeCoercion) On Wednesday, 23 October, 2024 at 06:16:19 pm IST, Cancai Cai <[email protected]> wrote: +1, Agreed. We need more details. Best wishes, Cancai Cai > 2024年10月23日 20:33,Xiong Duan <[email protected]> 写道: > > Hi Abbas, I don't understand what you mean. Can you use an SQL > statement to express what kind of output you want and What the > Calcite's output is? Then we will discuss the details.
