[jira] [Commented] (HIVE-15581) Unable to use advanced aggregation with multiple inserts clause

Carter Shanklin (JIRA) Sun, 15 Jan 2017 08:28:25 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823178#comment-15823178
 ]


Carter Shanklin commented on HIVE-15581:
----------------------------------------

Have you tried it with Hive on Tez? Not saying it will help but if the problem 
is specific to Hive on MapReduce it's not likely it will ever be fixed.

> Unable to use advanced aggregation with multiple inserts clause
> ---------------------------------------------------------------
>
>                 Key: HIVE-15581
>                 URL: https://issues.apache.org/jira/browse/HIVE-15581
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.2.1
>            Reporter: James Ball
>              Labels: newbie
>
> ■Use Cases
> - Use multiple insert clauses within a single query to insert multiple static 
> (user-defined) partitions into a single table.
> - Use advanced aggregation (cube) features within each insert clause to 
> include subtotals of columns for each partition
> ■Expected Behaviour
> - Subtotals are inserted for all combinations of the set of columns
> ■Observed Behaviour
> - No subtotals are not inserted for any combination of the set of columns
> ■Sample Queries
> {code:sql}
> // Create test tables
> create table if not exists
>       table1
>       (
>               column1 string,
>               column2 string,
>               column3 int
>               )
>       stored as orc
>       tblproperties
>       (
>               "orc.compress" = "SNAPPY"
>               );
> create table if not exists
>       table2
>       (
>               column1 string,
>               column2 string,
>               column3 int
>               )
>       partitioned by
>       (
>               partition1 string
>               )
>       stored as orc
>       tblproperties
>       (
>               "orc.compress" = "SNAPPY"
>               );
> create table if not exists
>       table3
>       (
>               column1 string,
>               column2 string,
>               column3 int
>               )
>       partitioned by
>       (
>               partition1 string
>               )
>       stored as orc
>       tblproperties
>       (
>               "orc.compress" = "SNAPPY"
>               );
> {code}
> {code:sql}
> // Insert test values
> insert overwrite table
>       table1
>       values
>               ('value1', 'value1', 1),
>               ('value2', 'value2', 1),
>               ('value3', 'value3', 1);
> {code}
> {code:sql}
> // Single insert clause with multiple inserts syntax
> // Subtotals are inserted into target table
> from
>       table1
> insert overwrite table
>       table2
>       partition
>       (
>               partition1 = 'value1'
>               )
>       select
>               column1,
>               column2,
>               sum(column3) as column3
>       group by
>               column1,
>               column2
>       with cube;
> {code}
> {code:sql}
> // Multiple insert clauses with multiple inserts syntax
> // Subtotals are not inserted into target table
> from
>       table1
> insert overwrite table
>       table3
>       partition
>       (
>               partition1 = 'value1'
>               )
>       select
>               column1,
>               column2,
>               sum(column3) as column3
>       group by
>               column1,
>               column2
>       with cube
> insert overwrite table
>       table3
>       partition
>       (
>               partition1 = 'value2'
>               )
>       select
>               column1,
>               column2,
>               sum(column3) as column3
>       group by
>               column1,
>               column2
>       with cube;
> {code}
> ■Executions Plans
> - Single insert clause with multiple inserts syntax
> {noformat}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
>   Stage-2 depends on stages: Stage-0
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: table1
>             Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE 
> Column stats: NONE
>             Select Operator
>               expressions: column1 (type: string), column2 (type: string), 
> column3 (type: int)
>               outputColumnNames: column1, column2, column3
>               Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE 
> Column stats: NONE
>               Group By Operator
>                 aggregations: sum(column3)
>                 keys: column1 (type: string), column2 (type: string), '0' 
> (type: string)
>                 mode: hash
>                 outputColumnNames: _col0, _col1, _col2, _col3
>                 Statistics: Num rows: 12 Data size: 2208 Basic stats: 
> COMPLETE Column stats: NONE
>                 Reduce Output Operator
>                   key expressions: _col0 (type: string), _col1 (type: 
> string), _col2 (type: string)
>                   sort order: +++
>                   Map-reduce partition columns: _col0 (type: string), _col1 
> (type: string), _col2 (type: string)
>                   Statistics: Num rows: 12 Data size: 2208 Basic stats: 
> COMPLETE Column stats: NONE
>                   value expressions: _col3 (type: bigint)
>       Reduce Operator Tree:
>         Group By Operator
>           aggregations: sum(VALUE._col0)
>           keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 
> (type: string)
>           mode: mergepartial
>           outputColumnNames: _col0, _col1, _col3
>           Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE 
> Column stats: NONE
>           pruneGroupingSetId: true
>           Select Operator
>             expressions: _col0 (type: string), _col1 (type: string), 
> UDFToInteger(_col3) (type: int)
>             outputColumnNames: _col0, _col1, _col2
>             Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE 
> Column stats: NONE
>             File Output Operator
>               compressed: false
>               Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE 
> Column stats: NONE
>               table:
>                   input format: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                   output format: 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                   serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                   name: zzz_james_ball.table2
>   Stage: Stage-0
>     Move Operator
>       tables:
>           partition:
>             partition1 value1
>           replace: true
>           table:
>               input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>               output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>               serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>               name: zzz_james_ball.table2
>   Stage: Stage-2
>     Stats-Aggr Operator
> {noformat}
> - Single insert clause with multiple inserts syntax
> {noformat}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-0 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-0
>   Stage-1 depends on stages: Stage-2
>   Stage-4 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: table1
>             Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE 
> Column stats: NONE
>             Select Operator
>               expressions: column1 (type: string), column2 (type: string), 
> column3 (type: int)
>               outputColumnNames: column1, column2, column3
>               Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE 
> Column stats: NONE
>               Reduce Output Operator
>                 key expressions: column1 (type: string), column2 (type: 
> string)
>                 sort order: ++
>                 Map-reduce partition columns: column1 (type: string), column2 
> (type: string)
>                 Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE 
> Column stats: NONE
>                 value expressions: column3 (type: int)
>       Reduce Operator Tree:
>         Forward
>           Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column 
> stats: NONE
>           Group By Operator
>             aggregations: sum(VALUE._col0)
>             keys: KEY._col0 (type: string), KEY._col1 (type: string)
>             mode: complete
>             outputColumnNames: _col0, _col1, _col2
>             Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
>             Select Operator
>               expressions: _col0 (type: string), _col1 (type: string), 
> UDFToInteger(_col2) (type: int)
>               outputColumnNames: _col0, _col1, _col2
>               Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
>               File Output Operator
>                 compressed: false
>                 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
>                 table:
>                     input format: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                     output format: 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                     serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                     name: zzz_james_ball.table3
>           Group By Operator
>             aggregations: sum(VALUE._col0)
>             keys: KEY._col0 (type: string), KEY._col1 (type: string)
>             mode: complete
>             outputColumnNames: _col0, _col1, _col2
>             Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
>             Select Operator
>               expressions: _col0 (type: string), _col1 (type: string), 
> UDFToInteger(_col2) (type: int)
>               outputColumnNames: _col0, _col1, _col2
>               Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
>               File Output Operator
>                 compressed: false
>                 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
>                 table:
>                     input format: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                     output format: 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                     serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                     name: zzz_james_ball.table3
>   Stage: Stage-0
>     Move Operator
>       tables:
>           partition:
>             partition1 value1
>           replace: true
>           table:
>               input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>               output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>               serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>               name: zzz_james_ball.table3
>   Stage: Stage-3
>     Stats-Aggr Operator
>   Stage: Stage-1
>     Move Operator
>       tables:
>           partition:
>             partition1 value2
>           replace: true
>           table:
>               input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>               output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>               serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>               name: zzz_james_ball.table3
>   Stage: Stage-4
>     Stats-Aggr Operator
> {noformat}
> ■Notes
> - This problem occurs with all advanced aggregation features (cube, grouping 
> sets, rollup)
> - This problem occurs whether hive.map.aggr is set to true or false
> - Dynamic partitions are not used because the partition values are set 
> manually within the where conditions of each insert clause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15581) Unable to use advanced aggregation with multiple inserts clause

Reply via email to