[GitHub] spark pull request #21704: [SPARK-24734][SQL] Fix containsNull of Concat for...

mn-mikke Wed, 04 Jul 2018 01:54:01 -0700

Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21704#discussion_r200054252
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
    @@ -2007,7 +2007,14 @@ case class Concat(children: Seq[Expression]) extends 
Expression {
         }
       }
     
    -  override def dataType: DataType = 
children.map(_.dataType).headOption.getOrElse(StringType)
    +  override def dataType: DataType = {
    +    val dataTypes = children.map(_.dataType)
    +    dataTypes.headOption.map {
    +      case ArrayType(et, _) =>
    +        ArrayType(et, 
dataTypes.exists(_.asInstanceOf[ArrayType].containsNull))
    +      case dt => dt
    +    }.getOrElse(StringType)
    +  }
    --- End diff --
    
    So far, we've identified also ```CaseWhen``` and  ```If``` discussed 
[here](https://github.com/apache/spark/pull/21687). I've just noticed that 
```Coalesce``` looks also suspicious.
    
    What is the key purpose of ```SimplifyCasts```? to remove an extra 
expression node or avoid casts from between to indentical types? If the second 
option is the purpose, what about changing ```SimplifyCasts``` rule to start 
replacing  ```Cast``` with a new dummy cast expression that will hold only a 
target data type and won't perform any casting?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21704: [SPARK-24734][SQL] Fix containsNull of Concat for...

Reply via email to