Chaoyu Tang created HIVE-13164: ---------------------------------- Summary: Predicate pushdown may cause cross-product in left semi join Key: HIVE-13164 URL: https://issues.apache.org/jira/browse/HIVE-13164 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Chaoyu Tang Assignee: Chaoyu Tang
For some left semi join queries like followings: select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t2.value = 'val_0'; or select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t1.value = 'val_0'; Their plans show that they have been converted to keyless cross-product due to the predicate pushdown and the dropping of the on condition. {code} LOGICAL PLAN: t1:t1 TableScan (TS_0) alias: t1 Statistics: Num rows: 1453 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator (FIL_18) predicate: (key = 0) (type: boolean) Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE Select Operator (SEL_2) Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator (RS_9) sort order: Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE Join Operator (JOIN_11) condition map: Left Semi Join 0 to 1 keys: 0 1 Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE Group By Operator (GBY_13) aggregations: count(1) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator (RS_14) sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Group By Operator (GBY_15) aggregations: count(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator (FS_17) compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe t2:t2 TableScan (TS_3) alias: t2 Statistics: Num rows: 645 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator (FIL_19) predicate: ((key = 0) and (value = 'val_0')) (type: boolean) Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE Select Operator (SEL_5) Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE Group By Operator (GBY_8) keys: 'val_0' (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator (RS_10) sort order: Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE Join Operator (JOIN_11) condition map: Left Semi Join 0 to 1 keys: 0 1 Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE {code} [~gopalv], do you think these plans are valid or not? Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)