Ruslan Dautkhanov created IMPALA-7204:
-----------------------------------------

             Summary: Add support for GROUP BY ROLLUP
                 Key: IMPALA-7204
                 URL: https://issues.apache.org/jira/browse/IMPALA-7204
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 2.12.0, Impala 3.0
            Reporter: Ruslan Dautkhanov


Now suppose that we'd like to analyze our sales data, to study the amount of 
sales that is occurring for different products, in different states and 
regions. Using the ROLLUP feature of SQL 2003, we could issue the query:

{code:sql}
select region, state, product, sum(sales) total_sales
from sales_history 
group by rollup (region, state, product)
{code}
Semantically, the above query is equivalent to

 
{code:sql}
select region, state, product, sum(sales) total_sales
from sales_history 
group by region, state, product
union
select region, state, null, sum(sales) total_sales
from sales_history 
group by region, state
union
select region, null, null, sum(sales) total_sales
from sales_history 
group by region
union
select null, null, null, sum(sales) total_sales
from sales_history
 
{code}

The query might produce results that looked something like:

{noformat}

REGION STATE PRODUCT TOTAL_SALES
------ ----- ------- -----------
null null null 6200
EAST MA BOATS 100
EAST MA CARS 1500
EAST MA null 1600
EAST NY BOATS 150
EAST NY CARS 1000
EAST NY null 1150
EAST null null 2750
WEST CA BOATS 750
WEST CA CARS 500
WEST CA null 1250
WEST AZ BOATS 2000
WEST AZ CARS 200
WEST AZ null 2200
WEST null null 3450

{noformat}

We have a lot of production queries that work around this missing Impala 
functionality by having three UNION ALLs. Physical execution plan shows Impala 
actually reads full fact table three times. So it could be a three times 
improvement (or more, depending on number of columns that are being rolled up).

I can't find another SQL on Hadoop engine that doesn't support this feature. 
Checked Spark, Hive, PIG, Flink and some other engines - they all do support 
this basic SQL features. Would be great to have a matching feature in Impala 
too. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to