[ 
https://issues.apache.org/jira/browse/SPARK-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Li updated SPARK-11277:
---------------------------
    Description: 
I was trying out the sort_array function then hit this exception. 

I looked into the spark source code. I found the root cause is that sort_array 
does not check for an array of NULLs. It's not meaningful to sort an array of 
entirely NULLs anyway. Similar issue exists with an array of struct type. 
I already have a fix for this issue and I'm going to create a pull request for 
it. 

scala> sqlContext.sql("select sort_array(array(null, null)) from t1").show()
scala.MatchError: ArrayType(NullType,true) (of class 
org.apache.spark.sql.types.ArrayType)
        at 
org.apache.spark.sql.catalyst.expressions.SortArray.lt$lzycompute(collectionOperations.scala:68)
        at 
org.apache.spark.sql.catalyst.expressions.SortArray.lt(collectionOperations.scala:67)
        at 
org.apache.spark.sql.catalyst.expressions.SortArray.nullSafeEval(collectionOperations.scala:111)
        at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:341)
        at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:440)
        at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:433)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        

  was:
I was trying out the sort_array function then hit this exception. 

I looked into the spark source code. I found the root cause is that sort_array 
does not check for an array of NULLs. It's not meaningful to sort an array of 
entirely NULLs anyway.
I already have a fix for this issue and I'm going to create a pull request for 
it. 

scala> sqlContext.sql("select sort_array(array(null, null)) from t1").show()
scala.MatchError: ArrayType(NullType,true) (of class 
org.apache.spark.sql.types.ArrayType)
        at 
org.apache.spark.sql.catalyst.expressions.SortArray.lt$lzycompute(collectionOperations.scala:68)
        at 
org.apache.spark.sql.catalyst.expressions.SortArray.lt(collectionOperations.scala:67)
        at 
org.apache.spark.sql.catalyst.expressions.SortArray.nullSafeEval(collectionOperations.scala:111)
        at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:341)
        at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:440)
        at 
org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:433)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        


> sort_array throws exception scala.MatchError
> --------------------------------------------
>
>                 Key: SPARK-11277
>                 URL: https://issues.apache.org/jira/browse/SPARK-11277
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>         Environment: Linux
>            Reporter: Jia Li
>
> I was trying out the sort_array function then hit this exception. 
> I looked into the spark source code. I found the root cause is that 
> sort_array does not check for an array of NULLs. It's not meaningful to sort 
> an array of entirely NULLs anyway. Similar issue exists with an array of 
> struct type. 
> I already have a fix for this issue and I'm going to create a pull request 
> for it. 
> scala> sqlContext.sql("select sort_array(array(null, null)) from t1").show()
> scala.MatchError: ArrayType(NullType,true) (of class 
> org.apache.spark.sql.types.ArrayType)
>       at 
> org.apache.spark.sql.catalyst.expressions.SortArray.lt$lzycompute(collectionOperations.scala:68)
>       at 
> org.apache.spark.sql.catalyst.expressions.SortArray.lt(collectionOperations.scala:67)
>       at 
> org.apache.spark.sql.catalyst.expressions.SortArray.nullSafeEval(collectionOperations.scala:111)
>       at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:341)
>       at 
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:440)
>       at 
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:433)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>       at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249)
>       at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>       



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to