Gopal V created HIVE-20739: ------------------------------ Summary: CBO: Rewrite INTERSECTS queries as EXISTS semi-joins Key: HIVE-20739 URL: https://issues.apache.org/jira/browse/HIVE-20739 Project: Hive Issue Type: Improvement Reporter: Gopal V
INTERSECTS clause currently materializes both sides before removing entries from each other. This means that if the intersects has filters on one side which apply to the other side, it is not transitively pushed to the other side. Here's the snippet from the TPC-DS Query8 where that is visible. {code} SELECT substr(ca_zip,1,5) ca_zip FROM customer_address WHERE substr(ca_zip,1,5) IN ( '89436', /* ... */ ','15492')) intersect (select ca_zip from (SELECT substr(ca_zip,1,5) ca_zip,count(*) cnt FROM customer_address, customer WHERE ca_address_sk = c_current_addr_sk and c_preferred_cust_flag='Y' group by ca_zip having count(*) > 10)A1))A2 {code} https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpcds/query8.sql#L92 where the entire join output of (customer_address x customer) is produced where the transitive filter inference could be applied to push the filter to the customer_address in the 2nd set in intersection. -- This message was sent by Atlassian JIRA (v7.6.3#76005)