Ruslan Dautkhanov created IMPALA-7204: -----------------------------------------
Summary: Add support for GROUP BY ROLLUP Key: IMPALA-7204 URL: https://issues.apache.org/jira/browse/IMPALA-7204 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Impala 2.12.0, Impala 3.0 Reporter: Ruslan Dautkhanov Now suppose that we'd like to analyze our sales data, to study the amount of sales that is occurring for different products, in different states and regions. Using the ROLLUP feature of SQL 2003, we could issue the query: {code:sql} select region, state, product, sum(sales) total_sales from sales_history group by rollup (region, state, product) {code} Semantically, the above query is equivalent to {code:sql} select region, state, product, sum(sales) total_sales from sales_history group by region, state, product union select region, state, null, sum(sales) total_sales from sales_history group by region, state union select region, null, null, sum(sales) total_sales from sales_history group by region union select null, null, null, sum(sales) total_sales from sales_history {code} The query might produce results that looked something like: {noformat} REGION STATE PRODUCT TOTAL_SALES ------ ----- ------- ----------- null null null 6200 EAST MA BOATS 100 EAST MA CARS 1500 EAST MA null 1600 EAST NY BOATS 150 EAST NY CARS 1000 EAST NY null 1150 EAST null null 2750 WEST CA BOATS 750 WEST CA CARS 500 WEST CA null 1250 WEST AZ BOATS 2000 WEST AZ CARS 200 WEST AZ null 2200 WEST null null 3450 {noformat} We have a lot of production queries that work around this missing Impala functionality by having three UNION ALLs. Physical execution plan shows Impala actually reads full fact table three times. So it could be a three times improvement (or more, depending on number of columns that are being rolled up). I can't find another SQL on Hadoop engine that doesn't support this feature. Checked Spark, Hive, PIG, Flink and some other engines - they all do support this basic SQL features. Would be great to have a matching feature in Impala too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org