Jesus Camacho Rodriguez created HIVE-13693:
----------------------------------------------
Summary: Multi-insert query drops Filter before file output when
there is a.val <> b.val
Key: HIVE-13693
URL: https://issues.apache.org/jira/browse/HIVE-13693
Project: Hive
Issue Type: Bug
Components: Logical Optimizer
Affects Versions: 2.0.0, 1.3.0, 2.1.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
To reproduce:
{noformat}
CREATE TABLE T_A ( id STRING, val STRING );
CREATE TABLE T_B ( id STRING, val STRING );
CREATE TABLE join_result_1 ( ida STRING, vala STRING, idb STRING, valb STRING
);
CREATE TABLE join_result_3 ( ida STRING, vala STRING, idb STRING, valb STRING
);
INSERT INTO TABLE T_A
VALUES ('Id_1', 'val_101'), ('Id_2', 'val_102'), ('Id_3', 'val_103');
INSERT INTO TABLE T_B
VALUES ('Id_1', 'val_103'), ('Id_2', 'val_104');
explain
FROM T_A a LEFT JOIN T_B b ON a.id = b.id
INSERT OVERWRITE TABLE join_result_1
SELECT a.*, b.*
WHERE b.id = 'Id_1' AND b.val = 'val_103'
INSERT OVERWRITE TABLE join_result_3
SELECT a.*, b.*
WHERE b.val = 'val_104' AND b.id = 'Id_2' AND a.val <> b.val;
{noformat}
The (wrong) plan is the following:
{noformat}
STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-3 depends on stages: Stage-2
Stage-0 depends on stages: Stage-3
Stage-4 depends on stages: Stage-0
Stage-1 depends on stages: Stage-3
Stage-5 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-2
Tez
DagId: haha_20160504140944_174465c9-5d1a-42f9-9665-fae02eeb2767:2
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE)
DagName:
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 3 Data size: 36 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
key expressions: id (type: string)
sort order: +
Map-reduce partition columns: id (type: string)
Statistics: Num rows: 3 Data size: 36 Basic stats: COMPLETE
Column stats: NONE
value expressions: val (type: string)
Map 3
Map Operator Tree:
TableScan
alias: b
Statistics: Num rows: 2 Data size: 24 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
key expressions: id (type: string)
sort order: +
Map-reduce partition columns: id (type: string)
Statistics: Num rows: 2 Data size: 24 Basic stats: COMPLETE
Column stats: NONE
value expressions: val (type: string)
Reducer 2
Reduce Operator Tree:
Merge Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 id (type: string)
1 id (type: string)
outputColumnNames: _col0, _col1, _col6
Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: string),
'Id_1' (type: string), 'val_103' (type: string)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE
Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: bugtest2.join_result_1
Filter Operator
predicate: (_col1 <> _col6) (type: boolean)
Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: string),
'Id_2' (type: string), 'val_104' (type: string)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 3 Data size: 39 Basic stats:
COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: bugtest2.join_result_3
Stage: Stage-3
Dependency Collection
Stage: Stage-0
Move Operator
tables:
replace: true
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: bugtest2.join_result_1
Stage: Stage-4
Stats-Aggr Operator
Stage: Stage-1
Move Operator
tables:
replace: true
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: bugtest2.join_result_3
Stage: Stage-5
Stats-Aggr Operator
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)