findepi commented on code in PR #12224: URL: https://github.com/apache/datafusion/pull/12224#discussion_r1739649597
########## datafusion/functions/src/string/concat.rs: ########## @@ -64,13 +66,19 @@ impl ScalarUDFImpl for ConcatFunc { &self.signature } - fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> { - Ok(Utf8) + fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { + use DataType::*; + Ok(match &arg_types[0] { + Utf8View => Utf8View, + LargeUtf8 => LargeUtf8, + _ => Utf8, + }) Review Comment: > when you say binary concat operator are you talking about `||` as an operator? yes > the logic seems to assume all arguments are of the same type? > > also, why not always return `Utf8`? the code performing actual concatenation seems to be always the same. the question still holds (why exactly we bias towards the first param type), but i am no longer convinced about my suggestion to use Utf8 always. i think we should "just" make sure `concat(a, b, c)` is type-equivalent to `a || b || c`. the `||`'s logic apparently is - if any of the operands are Utf8View, the result is Utf8View - else, if any of the operands are LargeUtf8, the result is LargeUtf8 cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org