[
https://issues.apache.org/jira/browse/HIVE-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815095#comment-13815095
]
Sivaramakrishnan Narayanan commented on HIVE-784:
-------------------------------------------------
Haven't looked at the code yet - I'm commenting based on the design doc.
Comment on section 2.2.1 in the design doc.
One thing to be wary of the NOT-IN to LOJ with null check transformation is
that it is only valid if the subquery's output is known to be non-nullable.
Example, consider two tables:
||T1.x|
|1|
|2|
|null|
||T2.y|
|1|
|null|
Now consider this uncorrelated NOT-IN subquery:
{code}
select * from T1 where T1.x not in (select y from T2)
{code}
Should produce (if I remember my SQL semantics correctly), an empty result i.e.
0 rows.
This is because "not in" is equivalent to <> ALL. If the inner subquery
produces a null, then its comparison with anything always produces null.
Therefore, <> ALL check fails for every x from T1. Therefore, the result is
empty.
According to section 2.2.1, the transformation to LOJ with null check will look
like this:
{code}
select * from T1 Left Outer Join (select y from T2) sq1 on (T1.x=sq1.y) where
sq1.y is null
{code}
The LOJ will produce rows like this:
||T1.x|T2.y||
|1| 1|
|2| null|
|null| null|
Applying the null filter after the LOJ produces two rows. This is an incorrect
result. If this case is already covered in your implementation, please ignore
the comment.
> Support uncorrelated subqueries in the WHERE clause
> ---------------------------------------------------
>
> Key: HIVE-784
> URL: https://issues.apache.org/jira/browse/HIVE-784
> Project: Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Ning Zhang
> Assignee: Harish Butani
> Fix For: 0.13.0
>
> Attachments: D13443.1.patch, D13443.2.patch, HIVE-784.1.patch.txt,
> HIVE-784.2.patch, SubQuerySpec.pdf, tpchQueriesUsingSubQueryClauses.sql
>
>
> Hive currently only support views in the FROM-clause, some Facebook use cases
> suggest that Hive should support subqueries such as those connected by
> IN/EXISTS in the WHERE-clause.
--
This message was sent by Atlassian JIRA
(v6.1#6144)