[
https://issues.apache.org/jira/browse/TAJO-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340095#comment-14340095
]
Jihoon Son commented on TAJO-1361:
----------------------------------
Here is the Hive's execution plan.
{noformat}
hive> explain select * from
> (select * from test_a where status='regist')a
> left outer join ( select * from test_a where status='start')b
> on a.id=b.id and a.id_detail =b.id_detail
> where b.id is null and b.id_detail is null;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
b:test_a
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
b:test_a
TableScan
alias: test_a
Statistics: Num rows: 2 Data size: 32 Basic stats: COMPLETE Column
stats: NONE
Filter Operator
predicate: (status = 'start') (type: boolean)
Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: id (type: int), id_detail (type: int), 'start'
(type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE
Column stats: NONE
HashTable Sink Operator
condition expressions:
0 {_col0} {_col1} {_col2}
1 {_col2}
keys:
0 _col0 (type: int), _col1 (type: int)
1 _col0 (type: int), _col1 (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: test_a
Statistics: Num rows: 2 Data size: 32 Basic stats: COMPLETE Column
stats: NONE
Filter Operator
predicate: (status = 'regist') (type: boolean)
Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: id (type: int), id_detail (type: int), 'regist'
(type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE
Column stats: NONE
Map Join Operator
condition map:
Left Outer Join0 to 1
condition expressions:
0 {_col0} {_col1} {_col2}
1 {_col0} {_col1} {_col2}
keys:
0 _col0 (type: int), _col1 (type: int)
1 _col0 (type: int), _col1 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
Statistics: Num rows: 1 Data size: 17 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: (_col3 is null and _col4 is null) (type: boolean)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: int), _col2
(type: string), null (type: void), null (type: void), _col5 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4,
_col5
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{noformat}
> Unexpected outer join behaviours
> --------------------------------
>
> Key: TAJO-1361
> URL: https://issues.apache.org/jira/browse/TAJO-1361
> Project: Tajo
> Issue Type: Bug
> Components: planner/optimizer
> Reporter: Jihoon Son
> Assignee: Jihoon Son
> Priority: Critical
>
> This bug is reported at Apache Tajo Korea User Group
> https://groups.google.com/forum/#!topic/tajo-user-kr/srFllmbThG0.
> The bug can be reproduced as follows.
> {noformat}
> default> \dfs -cat /test/test.tbl
> 1,1,regist
> 1,2,regist
> 1,1,start
> default> create external table test_a ( id int , id_detail int , status text)
> using text with ('csvfile.delimiter'=',') location '/test';
> OK
> default> select * from
> > (select * from test_a where status='regist')a
> > left outer join ( select * from test_a where status='start')b
> > on a.id=b.id and a.id_detail =b.id_detail
> > where b.id is null and b.id_detail is null;
> Progress: 100%, response time: 1.57 sec
> id, id_detail, status, id, id_detail, status
> -------------------------------
> 1, 1, regist, 1, 1, start
> 1, 2, regist, , ,
> (2 rows, 1.57 sec, 37 B selected)
> {noformat}
> The expected query result is :
> {noformat}
> id, id_detail, status, id, id_detail, status
> 1, 2, regist, , ,
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)