[
https://issues.apache.org/jira/browse/HIVE-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222964#comment-14222964
]
Rui Li commented on HIVE-8536:
------------------------------
With the latest patch, the previous example query plan is:
{noformat}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-4 depends on stages: Stage-1 , consists of Stage-3
Stage-3
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-1
Spark
Edges:
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 3 (PARTITION-LEVEL
SORT, 1)
DagName: root_20141124135050_f1804bff-a38d-496e-92de-62d50567da1c:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: tiny
Statistics: Num rows: 2 Data size: 241 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: pagerank is not null (type: boolean)
Statistics: Num rows: 1 Data size: 120 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: pagerank (type: int)
sort order: +
Map-reduce partition columns: pagerank (type: int)
Statistics: Num rows: 1 Data size: 120 Basic stats:
COMPLETE Column stats: NONE
value expressions: pageurl (type: string), avgduration
(type: int)
Map 3
Map Operator Tree:
TableScan
alias: rankings
Statistics: Num rows: 8922992 Data size: 963683136 Basic
stats: COMPLETE Column stats: NONE
Filter Operator
predicate: pagerank is not null (type: boolean)
Statistics: Num rows: 4461496 Data size: 481841568 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: pagerank (type: int)
sort order: +
Map-reduce partition columns: pagerank (type: int)
Statistics: Num rows: 4461496 Data size: 481841568 Basic
stats: COMPLETE Column stats: NONE
value expressions: pageurl (type: string), avgduration
(type: int)
Reducer 2
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col1}
1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col1}
handleSkewJoin: true
outputColumnNames: _col0, _col1, _col2, _col6, _col7, _col8
Statistics: Num rows: 4907645 Data size: 530025736 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: int), _col2
(type: int), _col6 (type: string), _col7 (type: int), _col8 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
Statistics: Num rows: 4907645 Data size: 530025736 Basic
stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 4907645 Data size: 530025736 Basic
stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-4
Conditional Operator
Stage: Stage-3
Spark
Edges:
Map 4 <- Map 5 (NONE, 0)
DagName: root_20141124135050_f1804bff-a38d-496e-92de-62d50567da1c:2
Vertices:
Map 4
Map Operator Tree:
TableScan
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {0_VALUE_0} {0_VALUE_1} {0_VALUE_2}
1 {1_VALUE_0} {1_VALUE_1} {1_VALUE_2}
keys:
0 reducesinkkey0 (type: int)
1 reducesinkkey0 (type: int)
outputColumnNames: _col0, _col1, _col2, _col6, _col7, _col8
Select Operator
expressions: _col0 (type: string), _col1 (type: int),
_col2 (type: int), _col6 (type: string), _col7 (type: int), _col8 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4,
_col5
File Output Operator
compressed: false
table:
input format:
org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Map 5
Map Operator Tree:
TableScan
Spark HashTable Sink Operator
condition expressions:
0 {0_VALUE_0} {0_VALUE_1} {0_VALUE_2}
1 {1_VALUE_0} {1_VALUE_1} {1_VALUE_2}
keys:
0 reducesinkkey0 (type: int)
1 reducesinkkey0 (type: int)
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{noformat}
> Enable SkewJoinResolver for spark [Spark Branch]
> ------------------------------------------------
>
> Key: HIVE-8536
> URL: https://issues.apache.org/jira/browse/HIVE-8536
> Project: Hive
> Issue Type: Improvement
> Components: Spark
> Reporter: Rui Li
> Assignee: Rui Li
> Attachments: HIVE-8536.1-spark.patch, HIVE-8536.2-spark.patch
>
>
> Sub-task of HIVE-8406
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)