AngersZhuuuu commented on a change in pull request #30144: URL: https://github.com/apache/spark/pull/30144#discussion_r515926187
########## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ########## @@ -3691,6 +3691,32 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark checkAnswer(sql("SELECT id FROM t WHERE (SELECT true)"), Row(0L)) } } + + test("SPARK-33229: Support GROUP BY use Separate columns and CUBE/ROLLUP") { + withTable("t") { + sql("CREATE TABLE t USING PARQUET AS SELECT id AS a, id AS b, id AS c FROM range(1)") + checkAnswer(sql("SELECT a, b, c, count(*) FROM t GROUP BY CUBE(a, b, c)"), + Row(0, 0, 0, 1) :: Row(0, 0, null, 1) :: + Row(0, null, 0, 1) :: Row(0, null, null, 1) :: + Row(null, 0, 0, 1) :: Row(null, 0, null, 1) :: + Row(null, null, 0, 1) :: Row(null, null, null, 1) :: Nil) + checkAnswer(sql("SELECT a, b, c, count(*) FROM t GROUP BY a, CUBE(b, c)"), Review comment: > what's the semantic of it? If we want some dimensional analysis group by `a` and different dimensional about combine `b` & `c`, in current we need to write `group by cube(a, b, c)` and `where a !=NULL` to remove interfering data, with this patch we can just write ``` group by a, cube(b, c) ``` And this set of PR can make Grouping Analytics more flexible as Postgres SQL. And we do have this need for analysis。 ########## File path: sql/core/src/test/resources/sql-tests/inputs/group-analytics.sql ########## @@ -59,4 +59,12 @@ SELECT course, year FROM courseSales GROUP BY CUBE(course, year) ORDER BY groupi -- Aliases in SELECT could be used in ROLLUP/CUBE/GROUPING SETS SELECT a + b AS k1, b AS k2, SUM(a - b) FROM testData GROUP BY CUBE(k1, k2); SELECT a + b AS k, b, SUM(a - b) FROM testData GROUP BY ROLLUP(k, b); -SELECT a + b, b AS k, SUM(a - b) FROM testData GROUP BY a + b, k GROUPING SETS(k) +SELECT a + b, b AS k, SUM(a - b) FROM testData GROUP BY a + b, k GROUPING SETS(k); + +-- GROUP BY use mixed Separate columns and CUBE/ROLLUP +SELECT a, b, count(1) FROM testData GROUP BY a, b, CUBE(a, b); +SELECT a, b, count(1) FROM testData GROUP BY a, b, ROLLUP(a, b); +SELECT a, b, count(1) FROM testData GROUP BY CUBE(a, b), ROLLUP(a, b); +SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), ROLLUP(b); +SELECT a, b, count(1) FROM testData GROUP BY CUBE(a, b), ROLLUP(a, b) GROUPING SETS(a, b); Review comment: This is not unsupported, but it can be fixed easy after refactor GROUPING ANALYTICS ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org