[ https://issues.apache.org/jira/browse/SPARK-24424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li updated SPARK-24424: ---------------------------- Description: Currently, our Group By clause follows Hive [https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup] : However, this does not match ANSI SQL compliance. The proposal is to update our parser and analyzer for ANSI compliance. For example, {code:java} GROUP BY col1, col2 WITH ROLLUP GROUP BY col1, col2 WITH CUBE GROUP BY col1, col2 GROUPING SET ... {code} It is nice to support ANSI SQL syntax at the same time. {code:java} GROUP BY ROLLUP(col1, col2) GROUP BY CUBE(col1, col2) GROUP BY GROUPING SET(...) {code} Note, we only need to support one-level grouping set in this stage. That means, nested grouping set is not supported. The parser changes should be like group-by-expressions >>-GROUP BY----+-hive-sql-group-by-expressions-----+--->< '-ansi-sql-grouping-set-expressions-' hive-sql-group-by-expressions '--GROUPING SETS--(--grouping-set-expressions--)--' .-,--------------. +--WITH CUBE--------------------------------------+ V | +--WITH ROLLUP------------------------------------+ >>---+-expression-+-+---+-------------------------------------------------+->< grouping-expressions-list .-,--------------. V | >>---+-expression-+-+-->< grouping-set-expressions .-,----------------------------. | .-,--------------. | | V | | V '-(------expression---+-)-' | >>----+-expression--------------+--+->< ansi-sql-grouping-set-expressions >>-+-ROLLUP--(--grouping-expression-list--)---------+-->< +-CUBE--(--grouping-expression-list--)-----------+ '-GROUPING SETS--(--grouping-set-expressions--)--' was: Currently, our Group By clause follows Hive [https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup] : However, this does not match ANSI SQL compliance. The proposal is to update our parser and analyzer for ANSI compliance. For example, {code:java} GROUP BY col1, col2 WITH ROLLUP GROUP BY col1, col2 WITH CUBE GROUP BY col1, col2 GROUPING SET ... {code} It is nice to support ANSI SQL syntax at the same time. {code:java} GROUP BY ROLLUP(col1, col2) GROUP BY CUBE(col1, col2) GROUP BY GROUPING SET(...) {code} Note, we only need to support one-level grouping set in this stage. That means, nested grouping set is not supported. The parser changes should be like group-by-expressions >>-GROUP BY----+-hive-sql-group-by-expressions-----+--->< *{color:#ff0000}'-ansi-sql-grouping-set-expressions-'{color}* hive-sql-group-by-expressions '--GROUPING SETS--(--grouping-set-expressions--)--' .-,--------------. +--WITH CUBE--------------------------------------+ V | +--WITH ROLLUP------------------------------------+ >>---+-expression-+-+---+-------------------------------------------------+->< grouping-expressions-list .-,--------------. V | >>---+-expression-+-+-->< grouping-set-expressions .-,----------------------------. | .-,--------------. | | V | | V '-(------expression---+-)-' | >>----+-expression--------------+--+->< {color:#ff0000}ansi-sql-grouping-set-expressions{color} {color:#ff0000} {color} {color:#ff0000}>>-+-ROLLUP--(--grouping-expression-list--)---------+--><{color} {color:#ff0000} +-CUBE--(--grouping-expression-list--)-----------+ {color} {color:#ff0000} '-GROUPING SETS--(--grouping-set-expressions--)--' {color} > Support ANSI-SQL compliant syntax for ROLLUP, CUBE and GROUPING SET > ------------------------------------------------------------------- > > Key: SPARK-24424 > URL: https://issues.apache.org/jira/browse/SPARK-24424 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.0 > Reporter: Xiao Li > Priority: Major > > Currently, our Group By clause follows Hive > [https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup] > : > However, this does not match ANSI SQL compliance. The proposal is to update > our parser and analyzer for ANSI compliance. > For example, > {code:java} > GROUP BY col1, col2 WITH ROLLUP > GROUP BY col1, col2 WITH CUBE > GROUP BY col1, col2 GROUPING SET ... > {code} > It is nice to support ANSI SQL syntax at the same time. > {code:java} > GROUP BY ROLLUP(col1, col2) > GROUP BY CUBE(col1, col2) > GROUP BY GROUPING SET(...) > {code} > Note, we only need to support one-level grouping set in this stage. That > means, nested grouping set is not supported. > The parser changes should be like > group-by-expressions > >>-GROUP BY----+-hive-sql-group-by-expressions-----+--->< > '-ansi-sql-grouping-set-expressions-' > hive-sql-group-by-expressions > '--GROUPING SETS--(--grouping-set-expressions--)--' > .-,--------------. +--WITH CUBE--------------------------------------+ > V | +--WITH ROLLUP------------------------------------+ > >>---+-expression-+-+---+-------------------------------------------------+->< > grouping-expressions-list > .-,--------------. > V | > >>---+-expression-+-+-->< > grouping-set-expressions > .-,----------------------------. > | .-,--------------. | > | V | | > V '-(------expression---+-)-' | > >>----+-expression--------------+--+->< > ansi-sql-grouping-set-expressions > >>-+-ROLLUP--(--grouping-expression-list--)---------+-->< > +-CUBE--(--grouping-expression-list--)-----------+ > '-GROUPING SETS--(--grouping-set-expressions--)--' -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org