[ https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935409#comment-15935409 ]
Irina Truong edited comment on SPARK-4296 at 3/21/17 10:01 PM: --------------------------------------------------------------- I have the same exception with pyspark when my expression uses a compiled and registered Scala UDF. This is how it's registered: {noformat} sqlContext.registerJavaFunction("round_date", 'my.package.RoundDate') {noformat} And this is how it's called: {noformat} ipdb> sqlContext.sql("SELECT round_date(t.ts, '1day') from (select timestamp('2017-02-02T10:11:12') as ts union select timestamp('2017-02-02T10:19:00') as ts) as t group by round_date(t.ts, '1day')").show() *** AnalysisException: u"expression 't.`ts`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;\nAggregate [UDF(ts#80, 1day)], [UDF(ts#80, 1day) AS UDF(ts, 1day)#82]\n+- SubqueryAlias t\n +- Distinct\n +- Union\n :- Project [cast(2017-02-02T10:11:12 as timestamp) AS ts#80]\n : +- OneRowRelation$\n +- Project [cast(2017-02-02T10:19:00 as timestamp) AS ts#81]\n +- OneRowRelation$\n" {noformat} was (Author: irinatruong): I have the same exception with pyspark when my expression uses a compiled and registered Scala UDF. This is how it's registered: {noformat} sqlContext.registerJavaFunction("round_date", 'my.package.RoundDate') {noformat} And this is how it's called: {noformat} ipdb> sqlContext.sql("SELECT round_date(t.ts, '1day') from (select timestamp('2017-02-02T10:11:12') as ts union select timestamp('2017-02-02T10:19:00') as ts) as t group by round_date(t.ts, '1day')").show() *** AnalysisException: u"expression 't.`ts`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;\nAggregate [UDF(ts#80, 1day)], [UDF(ts#80, 1day) AS UDF(ts, 1day)#82]\n+- SubqueryAlias t\n +- Distinct\n +- Union\n :- Project [cast(2017-02-02T10:11:12 as timestamp) AS ts#80]\n : +- OneRowRelation$\n +- Project [cast(2017-02-02T10:19:00 as timestamp) AS ts#81]\n +- OneRowRelation$\n" {noformat} > Throw "Expression not in GROUP BY" when using same expression in group by > clause and select clause > --------------------------------------------------------------------------------------------------- > > Key: SPARK-4296 > URL: https://issues.apache.org/jira/browse/SPARK-4296 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.1.0, 1.1.1, 1.2.0 > Reporter: Shixiong Zhu > Assignee: Cheng Lian > Priority: Blocker > Fix For: 1.2.1, 1.3.0 > > > When the input data has a complex structure, using same expression in group > by clause and select clause will throw "Expression not in GROUP BY". > {code:java} > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.createSchemaRDD > case class Birthday(date: String) > case class Person(name: String, birthday: Birthday) > val people = sc.parallelize(List(Person("John", Birthday("1990-01-22")), > Person("Jim", Birthday("1980-02-28")))) > people.registerTempTable("people") > val year = sqlContext.sql("select count(*), upper(birthday.date) from people > group by upper(birthday.date)") > year.collect > {code} > Here is the plan of year: > {code:java} > SchemaRDD[3] at RDD at SchemaRDD.scala:105 > == Query Plan == > == Physical Plan == > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression > not in GROUP BY: Upper(birthday#1.date AS date#9) AS c1#3, tree: > Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date > AS date#9) AS c1#3] > Subquery people > LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at > ExistingRDD.scala:36 > {code} > The bug is the equality test for `Upper(birthday#1.date)` and > `Upper(birthday#1.date AS date#9)`. > Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias > expression. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org