[ https://issues.apache.org/jira/browse/SPARK-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jia Li updated SPARK-11277: --------------------------- Description: I was trying out the sort_array function then hit this exception. I looked into the spark source code. I found the root cause is that sort_array does not check for an array of NULLs. It's not meaningful to sort an array of entirely NULLs anyway. Similar issue exists with an array of struct type. I already have a fix for this issue and I'm going to create a pull request for it. scala> sqlContext.sql("select sort_array(array(null, null)) from t1").show() scala.MatchError: ArrayType(NullType,true) (of class org.apache.spark.sql.types.ArrayType) at org.apache.spark.sql.catalyst.expressions.SortArray.lt$lzycompute(collectionOperations.scala:68) at org.apache.spark.sql.catalyst.expressions.SortArray.lt(collectionOperations.scala:67) at org.apache.spark.sql.catalyst.expressions.SortArray.nullSafeEval(collectionOperations.scala:111) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:341) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:440) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:433) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) was: I was trying out the sort_array function then hit this exception. I looked into the spark source code. I found the root cause is that sort_array does not check for an array of NULLs. It's not meaningful to sort an array of entirely NULLs anyway. I already have a fix for this issue and I'm going to create a pull request for it. scala> sqlContext.sql("select sort_array(array(null, null)) from t1").show() scala.MatchError: ArrayType(NullType,true) (of class org.apache.spark.sql.types.ArrayType) at org.apache.spark.sql.catalyst.expressions.SortArray.lt$lzycompute(collectionOperations.scala:68) at org.apache.spark.sql.catalyst.expressions.SortArray.lt(collectionOperations.scala:67) at org.apache.spark.sql.catalyst.expressions.SortArray.nullSafeEval(collectionOperations.scala:111) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:341) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:440) at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:433) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > sort_array throws exception scala.MatchError > -------------------------------------------- > > Key: SPARK-11277 > URL: https://issues.apache.org/jira/browse/SPARK-11277 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.0 > Environment: Linux > Reporter: Jia Li > > I was trying out the sort_array function then hit this exception. > I looked into the spark source code. I found the root cause is that > sort_array does not check for an array of NULLs. It's not meaningful to sort > an array of entirely NULLs anyway. Similar issue exists with an array of > struct type. > I already have a fix for this issue and I'm going to create a pull request > for it. > scala> sqlContext.sql("select sort_array(array(null, null)) from t1").show() > scala.MatchError: ArrayType(NullType,true) (of class > org.apache.spark.sql.types.ArrayType) > at > org.apache.spark.sql.catalyst.expressions.SortArray.lt$lzycompute(collectionOperations.scala:68) > at > org.apache.spark.sql.catalyst.expressions.SortArray.lt(collectionOperations.scala:67) > at > org.apache.spark.sql.catalyst.expressions.SortArray.nullSafeEval(collectionOperations.scala:111) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:341) > at > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:440) > at > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$9$$anonfun$applyOrElse$2.applyOrElse(Optimizer.scala:433) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org