[ https://issues.apache.org/jira/browse/HIVE-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-870: -------------------------------- Summary: Implement LEFT SEMI JOIN (was: semi joins) Component/s: Query Processor > Implement LEFT SEMI JOIN > ------------------------ > > Key: HIVE-870 > URL: https://issues.apache.org/jira/browse/HIVE-870 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Ning Zhang > Assignee: Ning Zhang > Fix For: 0.5.0 > > Attachments: Hive-870.patch, Hive-870_2.patch, Hive-870_3.patch > > > Semi-join is an efficient way to unnest an IN/EXISTS subquery. For example, > select * > from A > where A.id IN > (select id > from B > where B.date> '2009-10-01'); > returns from A whose ID is in the set of IDs found in B, whose date is > greater than a certain date. This query can be unnested using a INNER join or > LEFT OUTER JOIN, but we need to deduplicate the IDs returned by the subquery > on table B. The semantics of LEFT SEMI JOIN is that as long as there is ANY > row in the right-hand table that matches the join key, the left-hand table > row will be emitted as a result w/o necessarily looking further in the > right-hand table for further matches. This is exactly the semantics of the IN > subquery. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.