-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27955/
-----------------------------------------------------------
(Updated Nov. 13, 2014, 4:38 a.m.)
Review request for hive, Szehon Ho and Xuefu Zhang.
Changes
-------
Addressing RB comments.
Bugs: HIVE-8842
https://issues.apache.org/jira/browse/HIVE-8842
Repository: hive-git
Description
-------
Enabling the SparkMapJoinResolver and SparkReduceSinkMapJoinProc, I see the
following:
explain select * from src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src
src3 ON (src1.key + src2.key = src3.key);
Enabling the SparkMapJoinResolver and SparkReduceSinkMapJoinProc, I see the
following:
{noformat}
explain select * from src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src
src3 ON (src1.key + src2.key = src3.key);
{noformat}
produces too many stages (six), and too many HashTableSink.
{noformat}
STAGE DEPENDENCIES:
Stage-5 is a root stage
Stage-4 depends on stages: Stage-5
Stage-3 depends on stages: Stage-4
Stage-7 is a root stage
Stage-6 depends on stages: Stage-7
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-5
Spark
DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: src2
Statistics: Num rows: 29 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 15 Data size: 3006 Basic stats:
COMPLETE Column stats: NONE
HashTable Sink Operator
condition expressions:
0 {key} {value}
1 {key} {value}
keys:
0 key (type: string)
1 key (type: string)
Stage: Stage-4
Spark
DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2
Vertices:
Map 3
Map Operator Tree:
TableScan
alias: src1
Statistics: Num rows: 29 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 15 Data size: 3006 Basic stats:
COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {key} {value}
1 {key} {value}
keys:
0 key (type: string)
1 key (type: string)
outputColumnNames: _col0, _col1, _col5, _col6
input vertices:
1 Map 1
Statistics: Num rows: 16 Data size: 3306 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: (_col0 + _col5) is not null (type: boolean)
Statistics: Num rows: 8 Data size: 1653 Basic stats:
COMPLETE Column stats: NONE
HashTable Sink Operator
condition expressions:
0 {_col0} {_col1} {_col5} {_col6}
1 {key} {value}
keys:
0 (_col0 + _col5) (type: double)
1 UDFToDouble(key) (type: double)
Stage: Stage-3
Spark
DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:1
Vertices:
Map 2
Map Operator Tree:
TableScan
alias: src3
Statistics: Num rows: 29 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: UDFToDouble(key) is not null (type: boolean)
Statistics: Num rows: 15 Data size: 3006 Basic stats:
COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {_col0} {_col1} {_col5} {_col6}
1 {key} {value}
keys:
0 (_col0 + _col5) (type: double)
1 UDFToDouble(key) (type: double)
outputColumnNames: _col0, _col1, _col5, _col6, _col10,
_col11
input vertices:
0 Map 3
Statistics: Num rows: 16 Data size: 3306 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type:
string), _col5 (type: string), _col6 (type: string), _col10 (type: string),
_col11 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4,
_col5
Statistics: Num rows: 16 Data size: 3306 Basic stats:
COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 16 Data size: 3306 Basic stats:
COMPLETE Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-7
Spark
DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: src2
Statistics: Num rows: 29 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 15 Data size: 3006 Basic stats:
COMPLETE Column stats: NONE
HashTable Sink Operator
condition expressions:
0 {key} {value}
1 {key} {value}
keys:
0 key (type: string)
1 key (type: string)
Stage: Stage-6
Spark
DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2
Vertices:
Map 3
Map Operator Tree:
TableScan
alias: src1
Statistics: Num rows: 29 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 15 Data size: 3006 Basic stats:
COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {key} {value}
1 {key} {value}
keys:
0 key (type: string)
1 key (type: string)
outputColumnNames: _col0, _col1, _col5, _col6
input vertices:
1 Map 1
Statistics: Num rows: 16 Data size: 3306 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: (_col0 + _col5) is not null (type: boolean)
Statistics: Num rows: 8 Data size: 1653 Basic stats:
COMPLETE Column stats: NONE
HashTable Sink Operator
condition expressions:
0 {_col0} {_col1} {_col5} {_col6}
1 {key} {value}
keys:
0 (_col0 + _col5) (type: double)
1 UDFToDouble(key) (type: double)
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{noformat}
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
a8b7ac6
Diff: https://reviews.apache.org/r/27955/diff/
Testing
-------
Thanks,
Chao Sun