Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21704#discussion_r200134825 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2007,7 +2007,14 @@ case class Concat(children: Seq[Expression]) extends Expression { } } - override def dataType: DataType = children.map(_.dataType).headOption.getOrElse(StringType) + override def dataType: DataType = { + val dataTypes = children.map(_.dataType) + dataTypes.headOption.map { + case ArrayType(et, _) => + ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) --- End diff -- @ueshin For ```Concat```, ```Coalesce```, etc. it seems to be that case since a coercion rule is executed if there is any nullability difference on any level of nesting. But it's not the case of ```CaseWhenCoercion``` rule, since ```sameType``` method is used for comparison. I'm wondering if the goal is to avoid generation of extra ```Cast``` expressions, shouldn't other coercion rules utilize ```sameType``` method as well? Let's assume that the result of ```concat``` is subsequently used by ```flatten```, wouldn't it lead to generation of extra null safe checks as mentioned [here](https://github.com/apache/spark/pull/21704#discussion_r200110924)?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org