Eugene Koifman created HIVE-15146: ------------------------------------- Summary: Too many Stats-Aggr Operator in multi-insert Key: HIVE-15146 URL: https://issues.apache.org/jira/browse/HIVE-15146 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Eugene Koifman Assignee: Pengcheng Xiong
Consider: {noformat} create table if not exists srcpart (a int, b int, c int) partitioned by (z int) clustered by (a) into 2 buckets stored as orc tblproperties("transactional"="true"); create temporary table if not exists data1 (x int); insert into data1 values (1),(2),(3); explain from data1 insert into srcpart partition(z) select 0,0,1,x insert into srcpart partition(z=1) select 0,0,1; {noformat} Then the plan looks like: {noformat} 2016-11-07T16:56:19,045 INFO [main] ql.TestTxnCommands2: STAGE DEPENDENCIES: Stage-2 is a root stage Stage-0 depends on stages: Stage-2 Stage-3 depends on stages: Stage-0 Stage-4 depends on stages: Stage-2 Stage-1 depends on stages: Stage-4 Stage-5 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Map Reduce Map Operator Tree: TableScan alias: data1 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: x (type: int) outputColumnNames: _col3 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Map-reduce partition columns: 0 (type: int) Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE value expressions: _col3 (type: int) Select Operator Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Reduce Operator Tree: Select Operator expressions: 0 (type: int), 0 (type: int), 1 (type: int), VALUE._col2 (type: int) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-0 Move Operator tables: partition: z replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-3 Stats-Aggr Operator Stage: Stage-4 Map Reduce Map Operator Tree: TableScan Reduce Output Operator sort order: Map-reduce partition columns: 0 (type: int) Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Select Operator expressions: 0 (type: int), 0 (type: int), 1 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-1 Move Operator tables: partition: z 1 replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-5 Stats-Aggr Operator {noformat} Note that there are 2 stats aggregation tasks but both branches of the multi-insert update the same partition Once HIVE-14943 is in, there will be other ways to generate the same sitation -- This message was sent by Atlassian JIRA (v6.3.4#6332)