[jira] [Commented] (IMPALA-7204) Add support for GROUP BY ROLLUP

2019-04-04 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810112#comment-16810112
 ] 

Ruslan Dautkhanov commented on IMPALA-7204:
---

cc [~grahn] - can you please let us know if you guys are planning to add 
support for this feature?

Our Account team said you might be the right person to ask )

Thanks for any ideas!


> Add support for GROUP BY ROLLUP
> ---
>
> Key: IMPALA-7204
> URL: https://issues.apache.org/jira/browse/IMPALA-7204
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Ruslan Dautkhanov
>Priority: Major
>  Labels: GROUP_BY, sql
>
> Now suppose that we'd like to analyze our sales data, to study the amount of 
> sales that is occurring for different products, in different states and 
> regions. Using the ROLLUP feature of SQL 2003, we could issue the query:
> {code:sql}
> select region, state, product, sum(sales) total_sales
> from sales_history 
> group by rollup (region, state, product)
> {code}
> Semantically, the above query is equivalent to
>  
> {code:sql}
> select region, state, product, sum(sales) total_sales
> from sales_history 
> group by region, state, product
> union
> select region, state, null, sum(sales) total_sales
> from sales_history 
> group by region, state
> union
> select region, null, null, sum(sales) total_sales
> from sales_history 
> group by region
> union
> select null, null, null, sum(sales) total_sales
> from sales_history
>  
> {code}
> The query might produce results that looked something like:
> {noformat}
> REGION STATE PRODUCT TOTAL_SALES
> -- - --- ---
> null null null 6200
> EAST MA BOATS 100
> EAST MA CARS 1500
> EAST MA null 1600
> EAST NY BOATS 150
> EAST NY CARS 1000
> EAST NY null 1150
> EAST null null 2750
> WEST CA BOATS 750
> WEST CA CARS 500
> WEST CA null 1250
> WEST AZ BOATS 2000
> WEST AZ CARS 200
> WEST AZ null 2200
> WEST null null 3450
> {noformat}
> We have a lot of production queries that work around this missing Impala 
> functionality by having three UNION ALLs. Physical execution plan shows 
> Impala actually reads full fact table three times. So it could be a three 
> times improvement (or more, depending on number of columns that are being 
> rolled up).
> I can't find another SQL on Hadoop engine that doesn't support this feature. 
>  *Checked Spark, Hive, PIG, Flink and some other engines - they all do 
> support this basic SQL feature*.
> Would be great to have a matching feature in Impala too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7204) Add support for GROUP BY ROLLUP

2018-10-23 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661346#comment-16661346
 ] 

Ruslan Dautkhanov commented on IMPALA-7204:
---

thank you [~tarmstrong]

Since now IMPALA-110 is resolved, would it be possible to put this on Impala 
Roadmap ? thanks!

> Add support for GROUP BY ROLLUP
> ---
>
> Key: IMPALA-7204
> URL: https://issues.apache.org/jira/browse/IMPALA-7204
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Ruslan Dautkhanov
>Priority: Major
>  Labels: GROUP_BY, sql
>
> Now suppose that we'd like to analyze our sales data, to study the amount of 
> sales that is occurring for different products, in different states and 
> regions. Using the ROLLUP feature of SQL 2003, we could issue the query:
> {code:sql}
> select region, state, product, sum(sales) total_sales
> from sales_history 
> group by rollup (region, state, product)
> {code}
> Semantically, the above query is equivalent to
>  
> {code:sql}
> select region, state, product, sum(sales) total_sales
> from sales_history 
> group by region, state, product
> union
> select region, state, null, sum(sales) total_sales
> from sales_history 
> group by region, state
> union
> select region, null, null, sum(sales) total_sales
> from sales_history 
> group by region
> union
> select null, null, null, sum(sales) total_sales
> from sales_history
>  
> {code}
> The query might produce results that looked something like:
> {noformat}
> REGION STATE PRODUCT TOTAL_SALES
> -- - --- ---
> null null null 6200
> EAST MA BOATS 100
> EAST MA CARS 1500
> EAST MA null 1600
> EAST NY BOATS 150
> EAST NY CARS 1000
> EAST NY null 1150
> EAST null null 2750
> WEST CA BOATS 750
> WEST CA CARS 500
> WEST CA null 1250
> WEST AZ BOATS 2000
> WEST AZ CARS 200
> WEST AZ null 2200
> WEST null null 3450
> {noformat}
> We have a lot of production queries that work around this missing Impala 
> functionality by having three UNION ALLs. Physical execution plan shows 
> Impala actually reads full fact table three times. So it could be a three 
> times improvement (or more, depending on number of columns that are being 
> rolled up).
> I can't find another SQL on Hadoop engine that doesn't support this feature. 
>  *Checked Spark, Hive, PIG, Flink and some other engines - they all do 
> support this basic SQL feature*.
> Would be great to have a matching feature in Impala too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7204) Add support for GROUP BY ROLLUP

2018-07-02 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530549#comment-16530549
 ] 

Ruslan Dautkhanov commented on IMPALA-7204:
---

Thanks [~tarmstrong] !

Would be great if it also adds infrastructure to runs these most heavy 
operations with intra-node parallelism.

Most of our query execution times come from count(distinct) as we do mostly 
count(distinct) on highly-cardinal values.. 

cc [~twmarshall]

> Add support for GROUP BY ROLLUP
> ---
>
> Key: IMPALA-7204
> URL: https://issues.apache.org/jira/browse/IMPALA-7204
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Ruslan Dautkhanov
>Priority: Critical
>  Labels: GROUP_BY, sql
>
> Now suppose that we'd like to analyze our sales data, to study the amount of 
> sales that is occurring for different products, in different states and 
> regions. Using the ROLLUP feature of SQL 2003, we could issue the query:
> {code:sql}
> select region, state, product, sum(sales) total_sales
> from sales_history 
> group by rollup (region, state, product)
> {code}
> Semantically, the above query is equivalent to
>  
> {code:sql}
> select region, state, product, sum(sales) total_sales
> from sales_history 
> group by region, state, product
> union
> select region, state, null, sum(sales) total_sales
> from sales_history 
> group by region, state
> union
> select region, null, null, sum(sales) total_sales
> from sales_history 
> group by region
> union
> select null, null, null, sum(sales) total_sales
> from sales_history
>  
> {code}
> The query might produce results that looked something like:
> {noformat}
> REGION STATE PRODUCT TOTAL_SALES
> -- - --- ---
> null null null 6200
> EAST MA BOATS 100
> EAST MA CARS 1500
> EAST MA null 1600
> EAST NY BOATS 150
> EAST NY CARS 1000
> EAST NY null 1150
> EAST null null 2750
> WEST CA BOATS 750
> WEST CA CARS 500
> WEST CA null 1250
> WEST AZ BOATS 2000
> WEST AZ CARS 200
> WEST AZ null 2200
> WEST null null 3450
> {noformat}
> We have a lot of production queries that work around this missing Impala 
> functionality by having three UNION ALLs. Physical execution plan shows 
> Impala actually reads full fact table three times. So it could be a three 
> times improvement (or more, depending on number of columns that are being 
> rolled up).
> I can't find another SQL on Hadoop engine that doesn't support this feature. 
>  *Checked Spark, Hive, PIG, Flink and some other engines - they all do 
> support this basic SQL feature*.
> Would be great to have a matching feature in Impala too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7204) Add support for GROUP BY ROLLUP

2018-06-25 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522649#comment-16522649
 ] 

Tim Armstrong commented on IMPALA-7204:
---

I think IMPALA-110 adds a lot of the required infrastructure.

> Add support for GROUP BY ROLLUP
> ---
>
> Key: IMPALA-7204
> URL: https://issues.apache.org/jira/browse/IMPALA-7204
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Ruslan Dautkhanov
>Priority: Critical
>  Labels: GROUP_BY, sql
>
> Now suppose that we'd like to analyze our sales data, to study the amount of 
> sales that is occurring for different products, in different states and 
> regions. Using the ROLLUP feature of SQL 2003, we could issue the query:
> {code:sql}
> select region, state, product, sum(sales) total_sales
> from sales_history 
> group by rollup (region, state, product)
> {code}
> Semantically, the above query is equivalent to
>  
> {code:sql}
> select region, state, product, sum(sales) total_sales
> from sales_history 
> group by region, state, product
> union
> select region, state, null, sum(sales) total_sales
> from sales_history 
> group by region, state
> union
> select region, null, null, sum(sales) total_sales
> from sales_history 
> group by region
> union
> select null, null, null, sum(sales) total_sales
> from sales_history
>  
> {code}
> The query might produce results that looked something like:
> {noformat}
> REGION STATE PRODUCT TOTAL_SALES
> -- - --- ---
> null null null 6200
> EAST MA BOATS 100
> EAST MA CARS 1500
> EAST MA null 1600
> EAST NY BOATS 150
> EAST NY CARS 1000
> EAST NY null 1150
> EAST null null 2750
> WEST CA BOATS 750
> WEST CA CARS 500
> WEST CA null 1250
> WEST AZ BOATS 2000
> WEST AZ CARS 200
> WEST AZ null 2200
> WEST null null 3450
> {noformat}
> We have a lot of production queries that work around this missing Impala 
> functionality by having three UNION ALLs. Physical execution plan shows 
> Impala actually reads full fact table three times. So it could be a three 
> times improvement (or more, depending on number of columns that are being 
> rolled up).
> I can't find another SQL on Hadoop engine that doesn't support this feature. 
>  *Checked Spark, Hive, PIG, Flink and some other engines - they all do 
> support this basic SQL feature*.
> Would be great to have a matching feature in Impala too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org