[ https://issues.apache.org/jira/browse/SPARK-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14655045#comment-14655045 ]
Apache Spark commented on SPARK-9323: ------------------------------------- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/7957 > DataFrame.orderBy gives confusing analysis errors when ordering based on > nested columns > --------------------------------------------------------------------------------------- > > Key: SPARK-9323 > URL: https://issues.apache.org/jira/browse/SPARK-9323 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.3.1, 1.4.1, 1.5.0 > Reporter: Josh Rosen > > The following two queries should be equivalent, but the second crashes: > {code} > sqlContext.read.json(sqlContext.sparkContext.makeRDD( > """{"a": {"b": 1, "a": {"a": 1}}, "c": [{"d": 1}]}""" :: Nil)) > .registerTempTable("nestedOrder") > checkAnswer(sql("SELECT a.b FROM nestedOrder ORDER BY a.b"), Row(1)) > checkAnswer(sql("select * from nestedOrder").select("a.b").orderBy("a.b"), > Row(1)) > {code} > Here's the stacktrace: > {code} > Cannot resolve column name "a.b" among (b); > org.apache.spark.sql.AnalysisException: Cannot resolve column name "a.b" > among (b); > at > org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:159) > at > org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:159) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:158) > at org.apache.spark.sql.DataFrame.col(DataFrame.scala:651) > at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:640) > at > org.apache.spark.sql.DataFrame$$anonfun$sort$1.apply(DataFrame.scala:593) > at > org.apache.spark.sql.DataFrame$$anonfun$sort$1.apply(DataFrame.scala:593) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.sql.DataFrame.sort(DataFrame.scala:593) > at org.apache.spark.sql.DataFrame.orderBy(DataFrame.scala:624) > at > org.apache.spark.sql.SQLQuerySuite$$anonfun$96.apply$mcV$sp(SQLQuerySuite.scala:1389) > {code} > Per [~marmbrus], the problem may be that {{DataFrame.resolve}} calls > {{resolveQuoted}}, causing the nested field to be treated as a single field > named {{a.b}}. > UPDATE: here's a shorter one-liner reproduction: > {code} > val df = sqlContext.read.json(sqlContext.sparkContext.makeRDD("""{"a": > {"b": 1}}""" :: Nil)) > checkAnswer(df.select("a.b").filter("a.b = a.b"), Row(1)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org