Clemens Valiente created HIVE-12720:
---------------------------------------

             Summary: Partition pruning not happening on left join with CTE
                 Key: HIVE-12720
                 URL: https://issues.apache.org/jira/browse/HIVE-12720
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.1.0
            Reporter: Clemens Valiente


In certain cases hive is not pruning partitions as well as it could be, and 
instead scanning the whole table.
Minimal use case:
{code:sql}
create table tbl1 (
id1 int,
ext_id2 int
)
PARTITIONED BY (ymd int);

create table tbl2 (
id2 int,
prop string
)
PARTITIONED BY (ymd int);

INSERT INTO tbl1 PARTITION(ymd=20151201) VALUES (1,1);
INSERT INTO tbl1 PARTITION(ymd=20151202) VALUES (1,2);
INSERT INTO tbl1 PARTITION(ymd=20151203) VALUES (1,3);

INSERT INTO tbl2 PARTITION(ymd=20151201) VALUES (1,'testa');
INSERT INTO tbl2 PARTITION(ymd=20151202) VALUES (2,'testb');
INSERT INTO tbl2 PARTITION(ymd=20151203) VALUES (3,'testc');


-- query 1
EXPLAIN
select * from
tbl1
LEFT OUTER JOIN
tbl2
ON
(tbl1.ext_id2 = tbl2.id2
 and tbl2.ymd = tbl1.ymd)
WHERE
tbl1.ymd = 20151203


-- query2
EXPLAIN
with cte as (select * from tbl1
WHERE
tbl1.ymd = 20151203)

select * from
cte
LEFT OUTER JOIN
tbl2
ON
(cte.ext_id2 = tbl2.id2
 and tbl2.ymd = cte.ymd)
WHERE
tbl2.ymd = 20151203
{code}

Query1 prunes correctly:
alias: tbl2
Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column stats: NONE
Query2 does a full table scan on tbl2:
 Statistics: Num rows: 4 Data size: 28 Basic stats: COMPLETE Column stats: NONE
I know both queries are not equivalent and the second query is not really a 
LEFT JOIN in its current state, but I think the optimiser could do a better job 
here.

This example was (re)produced with a cdh 5.5 docker image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to