Shardul Mahadik created SPARK-37822:
---------------------------------------

             Summary: SQL function `split` should return an array of 
non-nullable elements
                 Key: SPARK-37822
                 URL: https://issues.apache.org/jira/browse/SPARK-37822
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Shardul Mahadik


Currently, {{split}} [returns the data 
type|https://github.com/apache/spark/blob/08dd010860cc176a33073928f4c0780d0ee98a08/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L532]
 {{ArrayType(StringType)}} which means the resultant array can contain nullable 
elements. However I do not see any case where the array can contain nulls.

In the case where either the provided string or delimiter is NULL, the output 
will be a NULL array. In case of empty string or no chars between delemiters, 
the output array will contain empty strings but never NULLs. So I propose we 
change the return type of {{split}} to mark elements as non-null.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to