----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27955/ -----------------------------------------------------------
(Updated Nov. 13, 2014, 4:38 a.m.) Review request for hive, Szehon Ho and Xuefu Zhang. Changes ------- Addressing RB comments. Bugs: HIVE-8842 https://issues.apache.org/jira/browse/HIVE-8842 Repository: hive-git Description ------- Enabling the SparkMapJoinResolver and SparkReduceSinkMapJoinProc, I see the following: explain select * from src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src src3 ON (src1.key + src2.key = src3.key); Enabling the SparkMapJoinResolver and SparkReduceSinkMapJoinProc, I see the following: {noformat} explain select * from src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src src3 ON (src1.key + src2.key = src3.key); {noformat} produces too many stages (six), and too many HashTableSink. {noformat} STAGE DEPENDENCIES: Stage-5 is a root stage Stage-4 depends on stages: Stage-5 Stage-3 depends on stages: Stage-4 Stage-7 is a root stage Stage-6 depends on stages: Stage-7 Stage-0 is a root stage STAGE PLANS: Stage: Stage-5 Spark DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3 Vertices: Map 1 Map Operator Tree: TableScan alias: src2 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} {value} 1 {key} {value} keys: 0 key (type: string) 1 key (type: string) Stage: Stage-4 Spark DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2 Vertices: Map 3 Map Operator Tree: TableScan alias: src1 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 {key} {value} keys: 0 key (type: string) 1 key (type: string) outputColumnNames: _col0, _col1, _col5, _col6 input vertices: 1 Map 1 Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (_col0 + _col5) is not null (type: boolean) Statistics: Num rows: 8 Data size: 1653 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {_col0} {_col1} {_col5} {_col6} 1 {key} {value} keys: 0 (_col0 + _col5) (type: double) 1 UDFToDouble(key) (type: double) Stage: Stage-3 Spark DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:1 Vertices: Map 2 Map Operator Tree: TableScan alias: src3 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: UDFToDouble(key) is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col1} {_col5} {_col6} 1 {key} {value} keys: 0 (_col0 + _col5) (type: double) 1 UDFToDouble(key) (type: double) outputColumnNames: _col0, _col1, _col5, _col6, _col10, _col11 input vertices: 0 Map 3 Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string), _col6 (type: string), _col10 (type: string), _col11 (type: string) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-7 Spark DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3 Vertices: Map 1 Map Operator Tree: TableScan alias: src2 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} {value} 1 {key} {value} keys: 0 key (type: string) 1 key (type: string) Stage: Stage-6 Spark DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2 Vertices: Map 3 Map Operator Tree: TableScan alias: src1 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 {key} {value} keys: 0 key (type: string) 1 key (type: string) outputColumnNames: _col0, _col1, _col5, _col6 input vertices: 1 Map 1 Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (_col0 + _col5) is not null (type: boolean) Statistics: Num rows: 8 Data size: 1653 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {_col0} {_col1} {_col5} {_col6} 1 {key} {value} keys: 0 (_col0 + _col5) (type: double) 1 UDFToDouble(key) (type: double) Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {noformat} Diffs (updated) ----- ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java a8b7ac6 Diff: https://reviews.apache.org/r/27955/diff/ Testing ------- Thanks, Chao Sun