lotan created HIVE-26678:
----------------------------
Summary: In the filter criteria associated with multiple tables,
the filter result of the subquery by not in or in is incorrect.
Key: HIVE-26678
URL: https://issues.apache.org/jira/browse/HIVE-26678
Project: Hive
Issue Type: Bug
Components: CBO
Affects Versions: 3.1.0
Reporter: lotan
Fix For: 4.0.0
create testtable as follow:
create table test101 (id string,id2 string);
create table test102 (id string,id2 string);
create table test103 (id string,id2 string);
create table test104 (id string,id2 string);
when cbo is false,run the following SQL statement:
explain select count(1) from test101 t1
left join test102 t2 on t1.id=t2.id
left join test103 t3 on t1.id=t3.id2
where t1.id in (select s.id from test104 s)
and t3.id2='123';
you will see:
The filter criteria in the right table are lost.
The execution plan is as follows:
+-----------------------------------------------------------------------------------------------------+
| Explain
|
+-----------------------------------------------------------------------------------------------------+
| STAGE DEPENDENCIES:
|
| Stage-9 is a root stage
|
| Stage-3 depends on stages: Stage-9
|
| Stage-0 depends on stages: Stage-3
|
|
|
| STAGE PLANS:
|
| Stage: Stage-9
|
| Map Reduce Local Work
|
| Alias -> Map Local Tables:
|
| sq_1:s
|
| Fetch Operator
|
| limit: -1
|
| t2
|
| Fetch Operator
|
| limit: -1
|
| t3
|
| Fetch Operator
|
| limit: -1
|
| Alias -> Map Local Operator Tree:
|
| sq_1:s
|
| TableScan
|
| alias: s
|
| Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column
stats: NONE |
| Filter Operator
|
| predicate: id is not null (type: boolean)
|
| Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
Column stats: NONE |
| Select Operator
|
| expressions: id (type: string)
|
| outputColumnNames: _col0
|
| Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
Column stats: NONE |
| Group By Operator
|
| keys: _col0 (type: string)
|
| mode: hash
|
| outputColumnNames: _col0
|
| Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
Column stats: NONE |
| HashTable Sink Operator
|
| keys:
|
| 0 _col0 (type: string)
|
| 1 _col0 (type: string)
|
| t2
|
| TableScan
|
| alias: t2
|
| Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column
stats: NONE |
| Filter Operator
|
| predicate: id is not null (type: boolean)
|
| Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
Column stats: NONE |
| HashTable Sink Operator
|
| keys:
|
| 0 id (type: string)
|
| 1 id (type: string)
|
| 2 id2 (type: string)
|
| t3
|
| TableScan
|
| alias: t3
|
| Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column
stats: NONE |
| Filter Operator
|
| predicate: id2 is not null (type: boolean)
|
| Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
Column stats: NONE |
| HashTable Sink Operator
|
| keys:
|
| 0 id (type: string)
|
| 1 id (type: string)
|
| 2 id2 (type: string)
|
|
|
| Stage: Stage-3
|
| Map Reduce
|
| Map Operator Tree:
|
| TableScan
|
| alias: t1
|
| Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column
stats: NONE |
| Filter Operator
|
| predicate: id is not null (type: boolean)
|
| Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
Column stats: NONE |
| Map Join Operator
|
| condition map:
|
| Left Outer Join 0 to 1
|
| Left Outer Join 0 to 2
|
| keys:
|
| 0 id (type: string)
|
| 1 id (type: string)
|
| 2 id2 (type: string)
|
| outputColumnNames: _col0
|
| Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL
Column stats: NONE |
| Map Join Operator
|
| condition map:
|
| Left Semi Join 0 to 1
|
| keys:
|
| 0 _col0 (type: string)
|
| 1 _col0 (type: string)
|
| Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL
Column stats: NONE |
| Group By Operator
|
| aggregations: count(1)
|
| mode: hash
|
| outputColumnNames: _col0
|
| Statistics: Num rows: 1 Data size: 8 Basic stats: PARTIAL
Column stats: NONE |
| Reduce Output Operator
|
| sort order:
|
| Statistics: Num rows: 1 Data size: 8 Basic stats:
PARTIAL Column stats: NONE |
| value expressions: _col0 (type: bigint)
|
| Local Work:
|
+-----------------------------------------------------------------------------------------------------+
| Explain
|
+-----------------------------------------------------------------------------------------------------+
| Map Reduce Local Work
|
| Reduce Operator Tree:
|
| Group By Operator
|
| aggregations: count(VALUE._col0)
|
| mode: mergepartial
|
| outputColumnNames: _col0
|
| Statistics: Num rows: 1 Data size: 8 Basic stats: PARTIAL Column
stats: NONE |
| File Output Operator
|
| compressed: false
|
| Statistics: Num rows: 1 Data size: 8 Basic stats: PARTIAL Column
stats: NONE |
| table:
|
| input format:
org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
|
|
|
| Stage: Stage-0
|
| Fetch Operator
|
| limit: -1
|
| Processor Tree:
|
| ListSink
|
|
|
+-----------------------------------------------------------------------------------------------------+
--
This message was sent by Atlassian Jira
(v8.20.10#820010)