dengzhhu653 commented on a change in pull request #2585:
URL: https://github.com/apache/hive/pull/2585#discussion_r813487836
##########
File path: ql/src/test/results/clientpositive/llap/partition_distinct_skew.q.out
##########
@@ -0,0 +1,261 @@
+PREHOOK: query: create table partition_distinct_skew(col1 string, col2 string)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@partition_distinct_skew
+POSTHOOK: query: create table partition_distinct_skew(col1 string, col2 string)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@partition_distinct_skew
+PREHOOK: query: insert into table partition_distinct_skew values('a', 'b'),
('a', 'a'), ('a', 'b')
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@partition_distinct_skew
+POSTHOOK: query: insert into table partition_distinct_skew values('a', 'b'),
('a', 'a'), ('a', 'b')
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@partition_distinct_skew
+POSTHOOK: Lineage: partition_distinct_skew.col1 SCRIPT []
+POSTHOOK: Lineage: partition_distinct_skew.col2 SCRIPT []
+PREHOOK: query: select col1, col2 from partition_distinct_skew
+PREHOOK: type: QUERY
+PREHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+POSTHOOK: query: select col1, col2 from partition_distinct_skew
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+a b
+a a
+a b
+PREHOOK: query: explain select col1, count(distinct col2), count(col2) from
partition_distinct_skew group by col1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+POSTHOOK: query: explain select col1, count(distinct col2), count(col2) from
partition_distinct_skew group by col1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+ Stage: Stage-1
+ Tez
+#### A masked pattern was here ####
+ Edges:
+ Reducer 2 <- Map 1 (SIMPLE_EDGE)
+ Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+#### A masked pattern was here ####
+ Vertices:
+ Map 1
+ Map Operator Tree:
+ TableScan
+ alias: partition_distinct_skew
+ Statistics: Num rows: 3 Data size: 510 Basic stats: COMPLETE
Column stats: COMPLETE
+ Select Operator
+ expressions: col1 (type: string), col2 (type: string)
+ outputColumnNames: col1, col2
+ Statistics: Num rows: 3 Data size: 510 Basic stats:
COMPLETE Column stats: COMPLETE
+ Group By Operator
+ aggregations: count(DISTINCT col2), count(col2)
+ keys: col1 (type: string), col2 (type: string)
+ minReductionHashAggr: 0.4
+ mode: hash
+ outputColumnNames: _col0, _col1, _col2, _col3
+ Statistics: Num rows: 2 Data size: 372 Basic stats:
COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col0 (type: string), _col1 (type:
string)
+ null sort order: zz
+ sort order: ++
+ Map-reduce partition columns: _col0 (type: string),
_col1 (type: string)
+ Statistics: Num rows: 2 Data size: 372 Basic stats:
COMPLETE Column stats: COMPLETE
+ value expressions: _col3 (type: bigint)
+ Execution mode: vectorized, llap
+ LLAP IO: all inputs
+ Reducer 2
+ Execution mode: llap
+ Reduce Operator Tree:
+ Group By Operator
+ aggregations: count(DISTINCT KEY._col1:0._col0),
count(VALUE._col1)
+ keys: KEY._col0 (type: string)
+ mode: partials
+ outputColumnNames: _col0, _col1, _col2
+ Statistics: Num rows: 2 Data size: 202 Basic stats: COMPLETE
Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: _col0 (type: string)
+ null sort order: z
+ sort order: +
+ Map-reduce partition columns: _col0 (type: string)
+ Statistics: Num rows: 2 Data size: 202 Basic stats: COMPLETE
Column stats: COMPLETE
+ value expressions: _col1 (type: bigint), _col2 (type: bigint)
+ Reducer 3
+ Execution mode: llap
+ Reduce Operator Tree:
+ Group By Operator
+ aggregations: count(VALUE._col0), count(VALUE._col1)
+ keys: KEY._col0 (type: string)
+ mode: final
+ outputColumnNames: _col0, _col1, _col2
+ Statistics: Num rows: 1 Data size: 101 Basic stats: COMPLETE
Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 1 Data size: 101 Basic stats: COMPLETE
Column stats: COMPLETE
+ table:
+ input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
+ output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: -1
+ Processor Tree:
+ ListSink
+
+PREHOOK: query: select col1, count(distinct col2), count(col2) from
partition_distinct_skew group by col1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+POSTHOOK: query: select col1, count(distinct col2), count(col2) from
partition_distinct_skew group by col1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+a 2 3
+PREHOOK: query: explain select col1, count(distinct col2) from
partition_distinct_skew group by col1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+POSTHOOK: query: explain select col1, count(distinct col2) from
partition_distinct_skew group by col1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@partition_distinct_skew
+#### A masked pattern was here ####
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+ Stage: Stage-1
+ Tez
+#### A masked pattern was here ####
+ Edges:
+ Reducer 2 <- Map 1 (SIMPLE_EDGE)
+ Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+ Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
+ Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
+#### A masked pattern was here ####
Review comment:
The plan of `select col1, count(distinct col2) from
partition_distinct_skew group by col1` introduces some redundant reducers.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]