Michael, thanks for the response and looking forward to try 1.3.1
________________________________ From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Friday, April 03, 2015 6:52 AM To: Haopu Wang Cc: user Subject: Re: [SparkSQL 1.3.0] Cannot resolve column name "SUM('p.q)" among (k, SUM('p.q)); Thanks for reporting. The root cause is (SPARK-5632 <https://issues.apache.org/jira/browse/SPARK-5632> ), which is actually pretty hard to fix. Fortunately, for this particular case there is an easy workaround: https://github.com/apache/spark/pull/5337 We can try to include this in 1.3.1. On Thu, Apr 2, 2015 at 3:29 AM, Haopu Wang <hw...@qilinsoft.com> wrote: Hi, I want to rename an aggregation field using DataFrame API. The aggregation is done on a nested field. But I got below exception. Do you see the same issue and any workaround? Thank you very much! ====== Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot resolve column name "SUM('p.q)" among (k, SUM('p.q)); at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala: 162) at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala: 162) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:161) at org.apache.spark.sql.DataFrame.col(DataFrame.scala:436) at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:426) at org.apache.spark.sql.DataFrame$$anonfun$3.apply(DataFrame.scala:244) at org.apache.spark.sql.DataFrame$$anonfun$3.apply(DataFrame.scala:243) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.sc ala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.sc ala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.s cala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.sql.DataFrame.toDF(DataFrame.scala:243) ====== And this code can be used to reproduce the issue: case class ChildClass(q: Long) case class ParentClass(k: String, p: ChildClass) def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("DFTest").setMaster("local[*]") val ctx = new SparkContext(conf) val sqlCtx = new HiveContext(ctx) import sqlCtx.implicits._ val source = ctx.makeRDD(Seq(ParentClass("c1", ChildClass(100)))).toDF import org.apache.spark.sql.functions._ val target = source.groupBy('k).agg('k, sum("p.q")) // This line prints the correct contents // k SUM('p.q) // c1 100 target.show // But this line triggers the exception target.toDF("key", "total") ======