RE: [SparkSQL 1.3.0] Cannot resolve column name "SUM('p.q)" among (k, SUM('p.q));

Haopu Wang Thu, 02 Apr 2015 16:49:09 -0700

Michael, thanks for the response and looking forward to try 1.3.1


________________________________

From: Michael Armbrust [mailto:mich...@databricks.com] 
Sent: Friday, April 03, 2015 6:52 AM
To: Haopu Wang
Cc: user
Subject: Re: [SparkSQL 1.3.0] Cannot resolve column name "SUM('p.q)"
among (k, SUM('p.q));

 

Thanks for reporting.  The root cause is (SPARK-5632
<https://issues.apache.org/jira/browse/SPARK-5632> ), which is actually
pretty hard to fix.  Fortunately, for this particular case there is an
easy workaround: https://github.com/apache/spark/pull/5337

 

We can try to include this in 1.3.1.

 

On Thu, Apr 2, 2015 at 3:29 AM, Haopu Wang <hw...@qilinsoft.com> wrote:

Hi, I want to rename an aggregation field using DataFrame API. The
aggregation is done on a nested field. But I got below exception.

Do you see the same issue and any workaround? Thank you very much!

 

======

Exception in thread "main" org.apache.spark.sql.AnalysisException:
Cannot resolve column name "SUM('p.q)" among (k, SUM('p.q));

        at
org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:
162)

        at
org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:
162)

        at scala.Option.getOrElse(Option.scala:120)

        at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:161)

        at org.apache.spark.sql.DataFrame.col(DataFrame.scala:436)

        at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:426)

        at
org.apache.spark.sql.DataFrame$$anonfun$3.apply(DataFrame.scala:244)

        at
org.apache.spark.sql.DataFrame$$anonfun$3.apply(DataFrame.scala:243)

        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.sc
ala:244)

        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.sc
ala:244)

        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.s
cala:33)

        at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

        at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

        at
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)

        at org.apache.spark.sql.DataFrame.toDF(DataFrame.scala:243)

======

 

And this code can be used to reproduce the issue:

 

  case class ChildClass(q: Long)

  case class ParentClass(k: String, p: ChildClass)

 

  def main(args: Array[String]): Unit = {

    

    val conf = new
SparkConf().setAppName("DFTest").setMaster("local[*]")

    val ctx = new SparkContext(conf)

    val sqlCtx = new HiveContext(ctx)

 

    import sqlCtx.implicits._

 

    val source = ctx.makeRDD(Seq(ParentClass("c1",
ChildClass(100)))).toDF

 

    import org.apache.spark.sql.functions._

 

    val target = source.groupBy('k).agg('k, sum("p.q"))

    

    // This line prints the correct contents

    // k  SUM('p.q)

    // c1 100

    target.show

    

    // But this line triggers the exception

    target.toDF("key", "total")

 

======

RE: [SparkSQL 1.3.0] Cannot resolve column name "SUM('p.q)" among (k, SUM('p.q));

Reply via email to