[ https://issues.apache.org/jira/browse/SPARK-31663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-31663: ------------------------------------ Assignee: (was: Apache Spark) > Grouping sets with having clause returns the wrong result > --------------------------------------------------------- > > Key: SPARK-31663 > URL: https://issues.apache.org/jira/browse/SPARK-31663 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0 > Reporter: Yuanjian Li > Priority: Major > Labels: correctness > > Grouping sets with having clause returns the wrong result when the condition > of having contained conflicting naming. See the below example: > {code:java} > select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING > SETS ((b), (a, b)) having b > 10{code} > The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the > right result should be > {code:java} > +---+ > | b| > +---+ > | 2| > | 2| > +---+{code} > instead of an empty result. > The root cause is similar to SPARK-31519, it's caused by we parsed HAVING as > Filter(..., Agg(...)) and resolved these two parts in different rules. The > CUBE and ROLLUP have the same issue. > Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server > 2017. > > For Apache Spark 2.0.2 ~ 2.3.4, the following query is tested. > {code:java} > spark-sql> select sum(a) as b from t group by b grouping sets(b) having b > > 10; > Time taken: 0.194 seconds > hive> select sum(a) as b from t group by b grouping sets(b) having b > 10; > 2 > Time taken: 1.605 seconds, Fetched: 1 row(s) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org