[ https://issues.apache.org/jira/browse/HIVE-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168395#comment-15168395 ]
Chaoyu Tang commented on HIVE-13164: ------------------------------------ Yeah, with the t1.key = t2.key, the query plan looks right and there is not cross-product. Thanks for pointing out. > Predicate pushdown may cause cross-product in left semi join > ------------------------------------------------------------ > > Key: HIVE-13164 > URL: https://issues.apache.org/jira/browse/HIVE-13164 > Project: Hive > Issue Type: Bug > Components: Query Processor > Reporter: Chaoyu Tang > Assignee: Chaoyu Tang > > For some left semi join queries like followings: > select count(1) from (select value from t1 where key = 0) t1 left semi join > (select value from t2 where key = 0) t2 on t2.value = 'val_0'; > or > select count(1) from (select value from t1 where key = 0) t1 left semi join > (select value from t2 where key = 0) t2 on t1.value = 'val_0'; > Their plans show that they have been converted to keyless cross-product due > to the predicate pushdown and the dropping of the on condition. > {code} > LOGICAL PLAN: > t1:t1 > TableScan (TS_0) > alias: t1 > Statistics: Num rows: 1453 Data size: 5812 Basic stats: COMPLETE Column > stats: NONE > Filter Operator (FIL_18) > predicate: (key = 0) (type: boolean) > Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column > stats: NONE > Select Operator (SEL_2) > Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE > Column stats: NONE > Reduce Output Operator (RS_9) > sort order: > Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE > Column stats: NONE > Join Operator (JOIN_11) > condition map: > Left Semi Join 0 to 1 > keys: > 0 > 1 > Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE > Column stats: NONE > Group By Operator (GBY_13) > aggregations: count(1) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: NONE > Reduce Output Operator (RS_14) > sort order: > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: NONE > value expressions: _col0 (type: bigint) > Group By Operator (GBY_15) > aggregations: count(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: NONE > File Output Operator (FS_17) > compressed: false > Statistics: Num rows: 1 Data size: 8 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > t2:t2 > TableScan (TS_3) > alias: t2 > Statistics: Num rows: 645 Data size: 5812 Basic stats: COMPLETE Column > stats: NONE > Filter Operator (FIL_19) > predicate: ((key = 0) and (value = 'val_0')) (type: boolean) > Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column > stats: NONE > Select Operator (SEL_5) > Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE > Column stats: NONE > Group By Operator (GBY_8) > keys: 'val_0' (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE > Column stats: NONE > Reduce Output Operator (RS_10) > sort order: > Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE > Column stats: NONE > Join Operator (JOIN_11) > condition map: > Left Semi Join 0 to 1 > keys: > 0 > 1 > Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE > Column stats: NONE > {code} > [~gopalv], do you think these plans are valid or not? Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)