osipovartem opened a new issue, #16510:
URL: https://github.com/apache/datafusion/issues/16510
### Is your feature request related to a problem or challenge?
Related to #12709
Currently, in BinaryTypeCoercer, we have the following logic for the
StringConcat operator:
```rust
StringConcat => {
string_concat_coercion(self.lhs,
self.rhs).map(Signature::uniform).ok_or_else(|| {
plan_datafusion_err!(
"Cannot infer common string type for string concat operation
{} {} {}", self.lhs, self.op, self.rhs
)
})
}
```
The internal string_concat_coercion function requires at least one side to
be a string-like type (Utf8, LargeUtf8, Utf8View) and the other side to be
castable to string:
```rust
/// Coercion rules for string concat.
/// This is a union of string coercion rules and specified rules:
/// 1. At least one side of lhs and rhs should be string type (Utf8 /
LargeUtf8)
/// 2. Data type of the other side should be able to cast to string type
fn string_concat_coercion(lhs_type: &DataType, rhs_type: &DataType) ->
Option<DataType> {
use arrow::datatypes::DataType::*;
string_coercion(lhs_type, rhs_type).or(match (lhs_type, rhs_type) {
(Utf8View, from_type) | (from_type, Utf8View) => {
string_concat_internal_coercion(from_type, &Utf8View)
}
(Utf8, from_type) | (from_type, Utf8) => {
string_concat_internal_coercion(from_type, &Utf8)
}
(LargeUtf8, from_type) | (from_type, LargeUtf8) => {
string_concat_internal_coercion(from_type, &LargeUtf8)
}
(Dictionary(_, lhs_value_type), Dictionary(_, rhs_value_type)) => {
string_coercion(lhs_value_type, rhs_value_type).or(None)
}
_ => None,
})
}
```
However, this implementation fails if both operands are non-string types
that can still be casted to string, for example:
```sql
select 1||2;
+----------------------+
| Int64(1) || Int64(2) |
|----------------------|
| 12 |
+----------------------+
```
This use case could be safely supported by leveraging existing
TypeCoercionRewriter, which is already responsible for inserting casts during
logical plan rewriting:
```rust
Expr::BinaryExpr(BinaryExpr { left, op, right }) => {
let (left, right) =
self.coerce_binary_op(*left, self.schema, op, *right,
self.schema)?;
Ok(Transformed::yes(Expr::BinaryExpr(BinaryExpr::new(
Box::new(left),
op,
Box::new(right),
))))
}
```
### Describe the solution you'd like
**Proposal**
We should update string_concat_coercion to also accept types where both
sides are castable to string, not just one side being a string already.
This would allow expressions like 1 || 2 to succeed by coercing both
operands to strings prior to concatenation.
This change aligns with the behavior in SQL engines like PostgreSQL and
makes coercion rules for || more intuitive and consistent with general type
coercion strategy.
```rust
fn string_concat_coercion(lhs_type: &DataType, rhs_type: &DataType) ->
Option<DataType> {
use arrow::datatypes::DataType::*;
string_coercion(lhs_type, rhs_type).or(match (lhs_type, rhs_type) {
(Utf8View, from_type) | (from_type, Utf8View) => {
string_concat_internal_coercion(from_type, &Utf8View)
}
(Utf8, from_type) | (from_type, Utf8) => {
string_concat_internal_coercion(from_type, &Utf8)
}
(LargeUtf8, from_type) | (from_type, LargeUtf8) => {
string_concat_internal_coercion(from_type, &LargeUtf8)
}
(Dictionary(_, lhs_value_type), Dictionary(_, rhs_value_type)) => {
string_coercion(lhs_value_type, rhs_value_type).or(None)
}
_ =>
if can_cast_types(lhs_type, &Utf8) && can_cast_types(rhs_type,
&Utf8) {
Some(Utf8)
} else if can_cast_types(lhs_type, &LargeUtf8) &&
can_cast_types(rhs_type, &LargeUtf8) {
Some(LargeUtf8)
} else {
None
}
})
}
```
### Describe alternatives you've considered
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]