Pavan Srinivas created HIVE-9753: ------------------------------------ Summary: Wrong results when using multiple levels of Joins. When table alias of one of the table is null with left outer joins. Key: HIVE-9753 URL: https://issues.apache.org/jira/browse/HIVE-9753 Project: Hive Issue Type: Bug Reporter: Pavan Srinivas Priority: Critical
Let take scenario, where the tables are: {code} drop table table1; CREATE TABLE table1( col1 string, col2 string, col3 string, col4 string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; drop table table2; CREATE TABLE table2( col1 string, col2 bigint, col3 string, col4 string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; drop table table3; CREATE TABLE table3( col1 string, col2 int, col3 int, col4 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; {code} Query with wrong results: {code} SELECT t1.col1 AS dummy, t1.expected_column AS expected_column, t2.col4 FROM ( SELECT col1, '23-11111', '23-13' as three, col4 AS expected_column FROM table1 ) t1 JOIN table2 t2 ON cast(t2.col1 as string) = cast(t1.col1 as string) LEFT OUTER JOIN (SELECT col4, col1 FROM table3 ) t3 ON t2.col4 = t3.col1 ; {code} and explain output: {code} STAGE DEPENDENCIES: Stage-7 is a root stage Stage-5 depends on stages: Stage-7 Stage-0 depends on stages: Stage-5 STAGE PLANS: Stage: Stage-7 Map Reduce Local Work Alias -> Map Local Tables: t1:table1 Fetch Operator limit: -1 t3:table3 Fetch Operator limit: -1 Alias -> Map Local Operator Tree: t1:table1 TableScan alias: table1 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Filter Operator predicate: col1 is not null (type: boolean) Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Select Operator expressions: col1 (type: string) outputColumnNames: _col0 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE HashTable Sink Operator condition expressions: 0 1 {col4} keys: 0 _col0 (type: string) 1 col1 (type: string) t3:table3 TableScan alias: table3 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Select Operator expressions: col1 (type: string) outputColumnNames: _col1 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE HashTable Sink Operator condition expressions: 0 {_col0} {_col7} {_col7} 1 keys: 0 _col7 (type: string) 1 _col1 (type: string) Stage: Stage-5 Map Reduce Map Operator Tree: TableScan alias: t2 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Filter Operator predicate: col1 is not null (type: boolean) Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} 1 {col4} keys: 0 _col0 (type: string) 1 col1 (type: string) outputColumnNames: _col0, _col7 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Map Join Operator condition map: Left Outer Join0 to 1 condition expressions: 0 {_col0} {_col7} {_col7} 1 keys: 0 _col7 (type: string) 1 _col1 (type: string) outputColumnNames: _col0, _col3, _col7 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Select Operator expressions: _col0 (type: string), _col3 (type: string), _col7 (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {code} After the patch, explain output(correct) {code} STAGE DEPENDENCIES: Stage-7 is a root stage Stage-5 depends on stages: Stage-7 Stage-0 depends on stages: Stage-5 STAGE PLANS: Stage: Stage-7 Map Reduce Local Work Alias -> Map Local Tables: t1:table1 Fetch Operator limit: -1 t3:table3 Fetch Operator limit: -1 Alias -> Map Local Operator Tree: t1:table1 TableScan alias: table1 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Filter Operator predicate: col1 is not null (type: boolean) Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Select Operator expressions: col1 (type: string), col4 (type: string) outputColumnNames: _col0, _col3 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE HashTable Sink Operator condition expressions: 0 {_col3} 1 {col4} keys: 0 _col0 (type: string) 1 col1 (type: string) t3:table3 TableScan alias: table3 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Select Operator expressions: col1 (type: string) outputColumnNames: _col1 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE HashTable Sink Operator condition expressions: 0 {_col0} {_col3} {_col7} 1 keys: 0 _col7 (type: string) 1 _col1 (type: string) Stage: Stage-5 Map Reduce Map Operator Tree: TableScan alias: t2 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Filter Operator predicate: col1 is not null (type: boolean) Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col3} 1 {col4} keys: 0 _col0 (type: string) 1 col1 (type: string) outputColumnNames: _col0, _col3, _col7 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Map Join Operator condition map: Left Outer Join0 to 1 condition expressions: 0 {_col0} {_col3} {_col7} 1 keys: 0 _col7 (type: string) 1 _col1 (type: string) outputColumnNames: _col0, _col3, _col7 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Select Operator expressions: _col0 (type: string), _col3 (type: string), _col7 (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {code} Notice the difference of "_col3" between the two explain outputs. Output of the query:(WRONG) {code} one bad_data_1 bad_data_1 one bad_data_1 bad_data_1 one bad_data_1 bad_data_1 one bad_data_1 bad_data_1 two bad_data_2 bad_data_2 two bad_data_2 bad_data_2 two bad_data_2 bad_data_2 two bad_data_2 bad_data_2 one bad_data_1 bad_data_1 one bad_data_1 bad_data_1 one bad_data_1 bad_data_1 one bad_data_1 bad_data_1 two bad_data_2 bad_data_2 two bad_data_2 bad_data_2 two bad_data_2 bad_data_2 two bad_data_2 bad_data_2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)