osipovartem opened a new issue, #16510:
URL: https://github.com/apache/datafusion/issues/16510

   ### Is your feature request related to a problem or challenge?
   
   Related to #12709
   
   Currently, in BinaryTypeCoercer, we have the following logic for the 
StringConcat operator:
   ```rust
     StringConcat => {
         string_concat_coercion(self.lhs, 
self.rhs).map(Signature::uniform).ok_or_else(|| {
             plan_datafusion_err!(
                 "Cannot infer common string type for string concat operation 
{} {} {}", self.lhs, self.op, self.rhs
             )
         })
     }
   ```
   
   The internal string_concat_coercion function requires at least one side to 
be a string-like type (Utf8, LargeUtf8, Utf8View) and the other side to be 
castable to string:
   ```rust
   /// Coercion rules for string concat.
   /// This is a union of string coercion rules and specified rules:
   /// 1. At least one side of lhs and rhs should be string type (Utf8 / 
LargeUtf8)
   /// 2. Data type of the other side should be able to cast to string type
   fn string_concat_coercion(lhs_type: &DataType, rhs_type: &DataType) -> 
Option<DataType> {
       use arrow::datatypes::DataType::*;
       string_coercion(lhs_type, rhs_type).or(match (lhs_type, rhs_type) {
           (Utf8View, from_type) | (from_type, Utf8View) => {
               string_concat_internal_coercion(from_type, &Utf8View)
           }
           (Utf8, from_type) | (from_type, Utf8) => {
               string_concat_internal_coercion(from_type, &Utf8)
           }
           (LargeUtf8, from_type) | (from_type, LargeUtf8) => {
               string_concat_internal_coercion(from_type, &LargeUtf8)
           }
           (Dictionary(_, lhs_value_type), Dictionary(_, rhs_value_type)) => {
               string_coercion(lhs_value_type, rhs_value_type).or(None)
           }
           _ => None,
       })
   }
   ```
   However, this implementation fails if both operands are non-string types 
that can still be casted to string, for example:
   ```sql
   select 1||2;
   +----------------------+
   | Int64(1) || Int64(2) |
   |----------------------|
   | 12                   |
   +----------------------+
   ```
   This use case could be safely supported by leveraging existing 
TypeCoercionRewriter, which is already responsible for inserting casts during 
logical plan rewriting:
   ```rust
     Expr::BinaryExpr(BinaryExpr { left, op, right }) => {
         let (left, right) =
             self.coerce_binary_op(*left, self.schema, op, *right, 
self.schema)?;
         Ok(Transformed::yes(Expr::BinaryExpr(BinaryExpr::new(
             Box::new(left),
             op,
             Box::new(right),
         ))))
     }
   ```
   
   ### Describe the solution you'd like
   
   **Proposal**
   We should update string_concat_coercion to also accept types where both 
sides are castable to string, not just one side being a string already.
   This would allow expressions like 1 || 2 to succeed by coercing both 
operands to strings prior to concatenation.
   This change aligns with the behavior in SQL engines like PostgreSQL and 
makes coercion rules for || more intuitive and consistent with general type 
coercion strategy.
   
   ```rust
   fn string_concat_coercion(lhs_type: &DataType, rhs_type: &DataType) -> 
Option<DataType> {
       use arrow::datatypes::DataType::*;
       string_coercion(lhs_type, rhs_type).or(match (lhs_type, rhs_type) {
           (Utf8View, from_type) | (from_type, Utf8View) => {
               string_concat_internal_coercion(from_type, &Utf8View)
           }
           (Utf8, from_type) | (from_type, Utf8) => {
               string_concat_internal_coercion(from_type, &Utf8)
           }
           (LargeUtf8, from_type) | (from_type, LargeUtf8) => {
               string_concat_internal_coercion(from_type, &LargeUtf8)
           }
           (Dictionary(_, lhs_value_type), Dictionary(_, rhs_value_type)) => {
               string_coercion(lhs_value_type, rhs_value_type).or(None)
           }
           _ =>
               if can_cast_types(lhs_type, &Utf8) && can_cast_types(rhs_type, 
&Utf8) {
                   Some(Utf8)
               } else if can_cast_types(lhs_type, &LargeUtf8) && 
can_cast_types(rhs_type, &LargeUtf8) {
                   Some(LargeUtf8)
               } else {
                   None
               }
       })
   }
   ```
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to