Clemens Valiente created HIVE-12720: ---------------------------------------
Summary: Partition pruning not happening on left join with CTE Key: HIVE-12720 URL: https://issues.apache.org/jira/browse/HIVE-12720 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Clemens Valiente In certain cases hive is not pruning partitions as well as it could be, and instead scanning the whole table. Minimal use case: {code:sql} create table tbl1 ( id1 int, ext_id2 int ) PARTITIONED BY (ymd int); create table tbl2 ( id2 int, prop string ) PARTITIONED BY (ymd int); INSERT INTO tbl1 PARTITION(ymd=20151201) VALUES (1,1); INSERT INTO tbl1 PARTITION(ymd=20151202) VALUES (1,2); INSERT INTO tbl1 PARTITION(ymd=20151203) VALUES (1,3); INSERT INTO tbl2 PARTITION(ymd=20151201) VALUES (1,'testa'); INSERT INTO tbl2 PARTITION(ymd=20151202) VALUES (2,'testb'); INSERT INTO tbl2 PARTITION(ymd=20151203) VALUES (3,'testc'); -- query 1 EXPLAIN select * from tbl1 LEFT OUTER JOIN tbl2 ON (tbl1.ext_id2 = tbl2.id2 and tbl2.ymd = tbl1.ymd) WHERE tbl1.ymd = 20151203 -- query2 EXPLAIN with cte as (select * from tbl1 WHERE tbl1.ymd = 20151203) select * from cte LEFT OUTER JOIN tbl2 ON (cte.ext_id2 = tbl2.id2 and tbl2.ymd = cte.ymd) WHERE tbl2.ymd = 20151203 {code} Query1 prunes correctly: alias: tbl2 Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column stats: NONE Query2 does a full table scan on tbl2: Statistics: Num rows: 4 Data size: 28 Basic stats: COMPLETE Column stats: NONE I know both queries are not equivalent and the second query is not really a LEFT JOIN in its current state, but I think the optimiser could do a better job here. This example was (re)produced with a cdh 5.5 docker image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)