semi joins
----------

                 Key: HIVE-870
                 URL: https://issues.apache.org/jira/browse/HIVE-870
             Project: Hadoop Hive
          Issue Type: New Feature
            Reporter: Ning Zhang
            Assignee: Ning Zhang


Semi-join is an efficient way to unnest an IN/EXISTS subquery. For example,

select * 
from A
where A.id IN 
   (select id
    from B
    where B.date> '2009-10-01');

returns from A whose ID is in the set of IDs found in B, whose date is greater 
than a certain date. This query can be unnested using a INNER join or LEFT 
OUTER JOIN, but we need to deduplicate the IDs returned by the subquery on 
table B. The semantics of LEFT SEMI JOIN is that as long as there is ANY row in 
the right-hand table that matches the join key, the left-hand table row will be 
emitted as a result w/o necessarily looking further in the right-hand table for 
further matches. This is exactly the semantics of the IN subquery. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to