Shixiong Zhu created SPARK-4296: ----------------------------------- Summary: Throw "Expression not in GROUP BY" when using same expression in group by clause and select clause Key: SPARK-4296 URL: https://issues.apache.org/jira/browse/SPARK-4296 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: Shixiong Zhu
When the input data has a complex structure, using same expression in group by clause and select clause will throw "Expression not in GROUP BY". {code:java} val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD case class Birthday(date: String) case class Person(name: String, birthday: Birthday) val people = sc.parallelize(List(Person("John", Birthday("1990-01-22")), Person("Jim", Birthday("1980-02-28")))) people.registerTempTable("people") val year = sqlContext.sql("select count(*), upper(birthday.date) from people group by upper(birthday.date)") year.collect {code} Here is the plan of year: {code:java} SchemaRDD[3] at RDD at SchemaRDD.scala:105 == Query Plan == == Physical Plan == org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression not in GROUP BY: Upper(birthday#1.date AS date#9) AS c1#3, tree: Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date AS date#9) AS c1#3] Subquery people LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:36 {code} The bug is the equality test for `Upper(birthday#1.date)` and `Upper(birthday#1.date AS date#9)`. Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org