dima machlin created HIVE-7346: ---------------------------------- Summary: Wrong results caused by hive ppd under specific join condition Key: HIVE-7346 URL: https://issues.apache.org/jira/browse/HIVE-7346 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: dima machlin
Assuming two tables : {code:sql} t1(id1 string,id2 string) , t2 (id string,d int) {code} t1 contains 1 row : 'a','a' t2 contains 1 row : 'a',2 The following query : {code:sql} select a.*,b.d d1,c.d d2 from t1 a join t2 b on (a.id1=b.id) join t2 c on (a.id2=b.id) where b.d <=1 and c.d<=1 {code} Returns 0 rows as expected because t2.d = 2 Wrapping this query, like so : {code:sql} select * from ( select a.*,b.d d1,c.d d2 from t1 a join t2 b on (a.id1=b.id) join t2 c on (a.id2=b.id) where b.d <=1 and c.d<=1 ) z where d1>1 or d2>1 {code} Where another filter was add on the columns causes the plan to lack the filter of the "<=1" and return a single row - *Wrong Results*. The plan is : {code:sql} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. (TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF (TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) id)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (<= (. (TOK_TABLE_OR_COL b) d) 1) (<= (. (TOK_TABLE_OR_COL c) d) 1))))) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (> (TOK_TABLE_OR_COL d1) 1) (> (TOK_TABLE_OR_COL d2) 1))))) STAGE DEPENDENCIES: Stage-7 is a root stage Stage-5 depends on stages: Stage-7 Stage-0 is a root stage STAGE PLANS: Stage: Stage-7 Map Reduce Local Work Alias -> Map Local Tables: z:b Fetch Operator limit: -1 z:c Fetch Operator limit: -1 Alias -> Map Local Operator Tree: z:b TableScan alias: b HashTable Sink Operator condition expressions: 0 {id1} {id2} 1 {id} {d} handleSkewJoin: false keys: 0 [Column[id1]] 1 [Column[id]] Position of Big Table: 0 z:c TableScan alias: c HashTable Sink Operator condition expressions: 0 {_col5} {_col0} {_col1} 1 {d} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 0 Stage: Stage-5 Map Reduce Alias -> Map Operator Tree: z:a TableScan alias: a Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {id1} {id2} 1 {id} {d} handleSkewJoin: false keys: 0 [Column[id1]] 1 [Column[id]] outputColumnNames: _col0, _col1, _col4, _col5 Position of Big Table: 0 Filter Operator predicate: expr: (_col1 = _col4) type: boolean Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col5} {_col0} {_col1} 1 {d} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col1, _col4, _col5, _col9 Position of Big Table: 0 Filter Operator predicate: expr: ((_col1 > 1) or (_col9 > 1)) type: boolean Select Operator expressions: expr: _col4 type: string expr: _col5 type: string expr: _col1 type: int expr: _col9 type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 {code} Setting : {code:sql} hive.optimize.ppd=false {code} Results in the following *correct* plan : {code:sql} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. (TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF (TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) id)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (<= (. (TOK_TABLE_OR_COL b) d) 1) (<= (. (TOK_TABLE_OR_COL c) d) 1))))) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (> (TOK_TABLE_OR_COL d1) 1) (> (TOK_TABLE_OR_COL d2) 1))))) STAGE DEPENDENCIES: Stage-7 is a root stage Stage-5 depends on stages: Stage-7 Stage-0 is a root stage STAGE PLANS: Stage: Stage-7 Map Reduce Local Work Alias -> Map Local Tables: z:b Fetch Operator limit: -1 z:c Fetch Operator limit: -1 Alias -> Map Local Operator Tree: z:b TableScan alias: b HashTable Sink Operator condition expressions: 0 {id1} {id2} 1 {id} {d} handleSkewJoin: false keys: 0 [Column[id1]] 1 [Column[id]] Position of Big Table: 0 z:c TableScan alias: c HashTable Sink Operator condition expressions: 0 {_col5} {_col0} {_col1} 1 {d} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 0 Stage: Stage-5 Map Reduce Alias -> Map Operator Tree: z:a TableScan alias: a Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {id1} {id2} 1 {id} {d} handleSkewJoin: false keys: 0 [Column[id1]] 1 [Column[id]] outputColumnNames: _col0, _col1, _col4, _col5 Position of Big Table: 0 Filter Operator predicate: expr: (_col1 = _col4) type: boolean Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col5} {_col0} {_col1} 1 {d} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col1, _col4, _col5, _col9 Position of Big Table: 0 Filter Operator predicate: expr: ((_col1 <= 1) and (_col9 <= 1)) type: boolean Select Operator expressions: expr: _col4 type: string expr: _col5 type: string expr: _col1 type: int expr: _col9 type: int outputColumnNames: _col0, _col1, _col2, _col3 Filter Operator predicate: expr: ((_col2 > 1) or (_col3 > 1)) type: boolean Select Operator expressions: expr: _col0 type: string expr: _col1 type: string expr: _col2 type: int expr: _col3 type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)