Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21704#discussion_r200054252 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2007,7 +2007,14 @@ case class Concat(children: Seq[Expression]) extends Expression { } } - override def dataType: DataType = children.map(_.dataType).headOption.getOrElse(StringType) + override def dataType: DataType = { + val dataTypes = children.map(_.dataType) + dataTypes.headOption.map { + case ArrayType(et, _) => + ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) + case dt => dt + }.getOrElse(StringType) + } --- End diff -- So far, we've identified also ```CaseWhen``` and ```If``` discussed [here](https://github.com/apache/spark/pull/21687). I've just noticed that ```Coalesce``` looks also suspicious. What is the key purpose of ```SimplifyCasts```? to remove an extra expression node or avoid casts from between to indentical types? If the second option is the purpose, what about changing ```SimplifyCasts``` rule to start replacing ```Cast``` with a new dummy cast expression that will hold only a target data type and won't perform any casting?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org