Gopal V created HIVE-20739:
------------------------------

             Summary: CBO: Rewrite INTERSECTS queries as EXISTS semi-joins
                 Key: HIVE-20739
                 URL: https://issues.apache.org/jira/browse/HIVE-20739
             Project: Hive
          Issue Type: Improvement
            Reporter: Gopal V


INTERSECTS clause currently materializes both sides before removing entries 
from each other.

This means that if the intersects has filters on one side which apply to the 
other side, it is not transitively pushed to the other side.

Here's the snippet from the TPC-DS Query8 where that is visible.

{code}
    SELECT substr(ca_zip,1,5) ca_zip
      FROM customer_address
      WHERE substr(ca_zip,1,5) IN (
                          '89436', /* ... */ ','15492'))
     intersect
     (select ca_zip
      from (SELECT substr(ca_zip,1,5) ca_zip,count(*) cnt
            FROM customer_address, customer
            WHERE ca_address_sk = c_current_addr_sk and
                  c_preferred_cust_flag='Y'
            group by ca_zip
            having count(*) > 10)A1))A2
{code}

https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpcds/query8.sql#L92

where the entire join output of (customer_address x customer) is produced where 
the transitive filter inference could be applied to push the filter to the 
customer_address in the 2nd set in intersection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to