[
https://issues.apache.org/jira/browse/HIVE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201198#comment-14201198
]
Suhas Satish commented on HIVE-8700:
------------------------------------
Have a patch which now generates the HashTableSinkOperators as follows. Will be
uploading a patch soon.
explain select table1.key, table2.value, table3.value from table1 join table2
on table1.key=table2.key join table3 on table1.key=table3.key;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Spark
Edges:
Map 3 <- Map 1 (NONE, 0), Map 2 (NONE, 0)
DagName: ssatish_20141106152828_299c0f54-40a8-4cf5-91f4-ecb1f420955f:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: table1
Statistics: Num rows: 1453 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 727 Data size: 2908 Basic stats:
COMPLETE Column stats: NONE
HashTable Sink Operator
condition expressions:
0 {key}
1 {value}
2 {value}
keys:
0 key (type: int)
1 key (type: int)
2 key (type: int)
Map 2
Map Operator Tree:
TableScan
alias: table3
Statistics: Num rows: 2 Data size: 216 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 1 Data size: 108 Basic stats:
COMPLETE Column stats: NONE
HashTable Sink Operator
condition expressions:
0 {key}
1 {value}
2 {value}
keys:
0 key (type: int)
1 key (type: int)
2 key (type: int)
Map 3
Map Operator Tree:
TableScan
alias: table2
Statistics: Num rows: 55 Data size: 5791 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 28 Data size: 2948 Basic stats:
COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
Inner Join 0 to 2
condition expressions:
0 {key}
1 {value}
2 {value}
keys:
0 key (type: int)
1 key (type: int)
2 key (type: int)
outputColumnNames: _col0, _col6, _col11
input vertices:
0 Map 1
2 Map 2
Statistics: Num rows: 1599 Data size: 6397 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col6 (type: string),
_col11 (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1599 Data size: 6397 Basic stats:
COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1599 Data size: 6397 Basic
stats: COMPLETE Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
> Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]
> ------------------------------------------------------------------------------
>
> Key: HIVE-8700
> URL: https://issues.apache.org/jira/browse/HIVE-8700
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Xuefu Zhang
> Assignee: Suhas Satish
> Attachments: HIVE-8700-spark.patch, HIVE-8700.patch
>
>
> With HIVE-8616 enabled, the new plan has ReduceSinkOperator for the small
> tables. For example, the follow represents the operator plan for the small
> table dec1 derived from query {code}explain select /*+ MAPJOIN(dec)*/ * from
> dec join dec1 on dec.value=dec1.d;{code}
> {code}
> Map 2
> Map Operator Tree:
> TableScan
> alias: dec1
> Statistics: Num rows: 0 Data size: 107 Basic stats: PARTIAL
> Column stats: NONE
> Filter Operator
> predicate: d is not null (type: boolean)
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
> Column stats: NONE
> Reduce Output Operator
> key expressions: d (type: decimal(5,2))
> sort order: +
> Map-reduce partition columns: d (type: decimal(5,2))
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
> Column stats: NONE
> value expressions: i (type: int)
> {code}
> With the new design for broadcasting small tables, we need to convert the
> ReduceSinkOperator with HashTableSinkOperator or equivalent in the new plan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)