[jira] [Commented] (IMPALA-7204) Add support for GROUP BY ROLLUP
[ https://issues.apache.org/jira/browse/IMPALA-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810112#comment-16810112 ] Ruslan Dautkhanov commented on IMPALA-7204: --- cc [~grahn] - can you please let us know if you guys are planning to add support for this feature? Our Account team said you might be the right person to ask ) Thanks for any ideas! > Add support for GROUP BY ROLLUP > --- > > Key: IMPALA-7204 > URL: https://issues.apache.org/jira/browse/IMPALA-7204 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Affects Versions: Impala 3.0, Impala 2.12.0 >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: GROUP_BY, sql > > Now suppose that we'd like to analyze our sales data, to study the amount of > sales that is occurring for different products, in different states and > regions. Using the ROLLUP feature of SQL 2003, we could issue the query: > {code:sql} > select region, state, product, sum(sales) total_sales > from sales_history > group by rollup (region, state, product) > {code} > Semantically, the above query is equivalent to > > {code:sql} > select region, state, product, sum(sales) total_sales > from sales_history > group by region, state, product > union > select region, state, null, sum(sales) total_sales > from sales_history > group by region, state > union > select region, null, null, sum(sales) total_sales > from sales_history > group by region > union > select null, null, null, sum(sales) total_sales > from sales_history > > {code} > The query might produce results that looked something like: > {noformat} > REGION STATE PRODUCT TOTAL_SALES > -- - --- --- > null null null 6200 > EAST MA BOATS 100 > EAST MA CARS 1500 > EAST MA null 1600 > EAST NY BOATS 150 > EAST NY CARS 1000 > EAST NY null 1150 > EAST null null 2750 > WEST CA BOATS 750 > WEST CA CARS 500 > WEST CA null 1250 > WEST AZ BOATS 2000 > WEST AZ CARS 200 > WEST AZ null 2200 > WEST null null 3450 > {noformat} > We have a lot of production queries that work around this missing Impala > functionality by having three UNION ALLs. Physical execution plan shows > Impala actually reads full fact table three times. So it could be a three > times improvement (or more, depending on number of columns that are being > rolled up). > I can't find another SQL on Hadoop engine that doesn't support this feature. > *Checked Spark, Hive, PIG, Flink and some other engines - they all do > support this basic SQL feature*. > Would be great to have a matching feature in Impala too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7204) Add support for GROUP BY ROLLUP
[ https://issues.apache.org/jira/browse/IMPALA-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661346#comment-16661346 ] Ruslan Dautkhanov commented on IMPALA-7204: --- thank you [~tarmstrong] Since now IMPALA-110 is resolved, would it be possible to put this on Impala Roadmap ? thanks! > Add support for GROUP BY ROLLUP > --- > > Key: IMPALA-7204 > URL: https://issues.apache.org/jira/browse/IMPALA-7204 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Affects Versions: Impala 3.0, Impala 2.12.0 >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: GROUP_BY, sql > > Now suppose that we'd like to analyze our sales data, to study the amount of > sales that is occurring for different products, in different states and > regions. Using the ROLLUP feature of SQL 2003, we could issue the query: > {code:sql} > select region, state, product, sum(sales) total_sales > from sales_history > group by rollup (region, state, product) > {code} > Semantically, the above query is equivalent to > > {code:sql} > select region, state, product, sum(sales) total_sales > from sales_history > group by region, state, product > union > select region, state, null, sum(sales) total_sales > from sales_history > group by region, state > union > select region, null, null, sum(sales) total_sales > from sales_history > group by region > union > select null, null, null, sum(sales) total_sales > from sales_history > > {code} > The query might produce results that looked something like: > {noformat} > REGION STATE PRODUCT TOTAL_SALES > -- - --- --- > null null null 6200 > EAST MA BOATS 100 > EAST MA CARS 1500 > EAST MA null 1600 > EAST NY BOATS 150 > EAST NY CARS 1000 > EAST NY null 1150 > EAST null null 2750 > WEST CA BOATS 750 > WEST CA CARS 500 > WEST CA null 1250 > WEST AZ BOATS 2000 > WEST AZ CARS 200 > WEST AZ null 2200 > WEST null null 3450 > {noformat} > We have a lot of production queries that work around this missing Impala > functionality by having three UNION ALLs. Physical execution plan shows > Impala actually reads full fact table three times. So it could be a three > times improvement (or more, depending on number of columns that are being > rolled up). > I can't find another SQL on Hadoop engine that doesn't support this feature. > *Checked Spark, Hive, PIG, Flink and some other engines - they all do > support this basic SQL feature*. > Would be great to have a matching feature in Impala too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7204) Add support for GROUP BY ROLLUP
[ https://issues.apache.org/jira/browse/IMPALA-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530549#comment-16530549 ] Ruslan Dautkhanov commented on IMPALA-7204: --- Thanks [~tarmstrong] ! Would be great if it also adds infrastructure to runs these most heavy operations with intra-node parallelism. Most of our query execution times come from count(distinct) as we do mostly count(distinct) on highly-cardinal values.. cc [~twmarshall] > Add support for GROUP BY ROLLUP > --- > > Key: IMPALA-7204 > URL: https://issues.apache.org/jira/browse/IMPALA-7204 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.0, Impala 2.12.0 >Reporter: Ruslan Dautkhanov >Priority: Critical > Labels: GROUP_BY, sql > > Now suppose that we'd like to analyze our sales data, to study the amount of > sales that is occurring for different products, in different states and > regions. Using the ROLLUP feature of SQL 2003, we could issue the query: > {code:sql} > select region, state, product, sum(sales) total_sales > from sales_history > group by rollup (region, state, product) > {code} > Semantically, the above query is equivalent to > > {code:sql} > select region, state, product, sum(sales) total_sales > from sales_history > group by region, state, product > union > select region, state, null, sum(sales) total_sales > from sales_history > group by region, state > union > select region, null, null, sum(sales) total_sales > from sales_history > group by region > union > select null, null, null, sum(sales) total_sales > from sales_history > > {code} > The query might produce results that looked something like: > {noformat} > REGION STATE PRODUCT TOTAL_SALES > -- - --- --- > null null null 6200 > EAST MA BOATS 100 > EAST MA CARS 1500 > EAST MA null 1600 > EAST NY BOATS 150 > EAST NY CARS 1000 > EAST NY null 1150 > EAST null null 2750 > WEST CA BOATS 750 > WEST CA CARS 500 > WEST CA null 1250 > WEST AZ BOATS 2000 > WEST AZ CARS 200 > WEST AZ null 2200 > WEST null null 3450 > {noformat} > We have a lot of production queries that work around this missing Impala > functionality by having three UNION ALLs. Physical execution plan shows > Impala actually reads full fact table three times. So it could be a three > times improvement (or more, depending on number of columns that are being > rolled up). > I can't find another SQL on Hadoop engine that doesn't support this feature. > *Checked Spark, Hive, PIG, Flink and some other engines - they all do > support this basic SQL feature*. > Would be great to have a matching feature in Impala too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7204) Add support for GROUP BY ROLLUP
[ https://issues.apache.org/jira/browse/IMPALA-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522649#comment-16522649 ] Tim Armstrong commented on IMPALA-7204: --- I think IMPALA-110 adds a lot of the required infrastructure. > Add support for GROUP BY ROLLUP > --- > > Key: IMPALA-7204 > URL: https://issues.apache.org/jira/browse/IMPALA-7204 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.0, Impala 2.12.0 >Reporter: Ruslan Dautkhanov >Priority: Critical > Labels: GROUP_BY, sql > > Now suppose that we'd like to analyze our sales data, to study the amount of > sales that is occurring for different products, in different states and > regions. Using the ROLLUP feature of SQL 2003, we could issue the query: > {code:sql} > select region, state, product, sum(sales) total_sales > from sales_history > group by rollup (region, state, product) > {code} > Semantically, the above query is equivalent to > > {code:sql} > select region, state, product, sum(sales) total_sales > from sales_history > group by region, state, product > union > select region, state, null, sum(sales) total_sales > from sales_history > group by region, state > union > select region, null, null, sum(sales) total_sales > from sales_history > group by region > union > select null, null, null, sum(sales) total_sales > from sales_history > > {code} > The query might produce results that looked something like: > {noformat} > REGION STATE PRODUCT TOTAL_SALES > -- - --- --- > null null null 6200 > EAST MA BOATS 100 > EAST MA CARS 1500 > EAST MA null 1600 > EAST NY BOATS 150 > EAST NY CARS 1000 > EAST NY null 1150 > EAST null null 2750 > WEST CA BOATS 750 > WEST CA CARS 500 > WEST CA null 1250 > WEST AZ BOATS 2000 > WEST AZ CARS 200 > WEST AZ null 2200 > WEST null null 3450 > {noformat} > We have a lot of production queries that work around this missing Impala > functionality by having three UNION ALLs. Physical execution plan shows > Impala actually reads full fact table three times. So it could be a three > times improvement (or more, depending on number of columns that are being > rolled up). > I can't find another SQL on Hadoop engine that doesn't support this feature. > *Checked Spark, Hive, PIG, Flink and some other engines - they all do > support this basic SQL feature*. > Would be great to have a matching feature in Impala too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org