[jira] Commented: (HIVE-741) NULL is not handled correctly in join

Ning Zhang (JIRA) Tue, 10 Aug 2010 10:39:42 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896974#action_12896974
 ]


Ning Zhang commented on HIVE-741:
---------------------------------

The joins are implemented in the JoinOperator and CommonJoinOperators for 
regular reduce-side joins. The map-side joins are implemented in the 
MapJoinOperator. 

In the reduce side joins, the join keys are treated as distribution keys from 
the mappers to the reducers so that each group (marked by beginGroup() and 
endGroup()) will consists of rows with the same join keys. The reduce-side 
joins will cache all rows within a group except the last one (aka streaming 
table), which is scanned and cartesian producted with the cached rows of the 
other tables. I think the fix would be to check the NULL value of the join keys 
and do proper output based on the semantics of different types of joins. 

For the map-side join, it's basically a hash join where the small table is read 
in entirety in a hash table and probed while scanning the streaming table. 

There are other types of joins (bucketed map-side join, sort merge join etc.), 
but they all rely on the 3 classes mentioned above. 

Let me know if you have further questions for you to get started. 

> NULL is not handled correctly in join
> -------------------------------------
>
>                 Key: HIVE-741
>                 URL: https://issues.apache.org/jira/browse/HIVE-741
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> With the following data in table input4_cb:
> Key        Value
> ------       --------
> NULL     325
> 18          NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL    325    18   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-741) NULL is not handled correctly in join

Reply via email to