[ https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900570#action_12900570 ]
Ning Zhang commented on HIVE-741: --------------------------------- Looks good in general. Some minor comments: 1) add more test cases for SMB joins. Currently the only test case has only 1 bucket which does not cover the most common use case. Can you add more test cases for more buckets? You can take a look at bucketed join queries included in the client positive tests. 2) SMBMapJoinOperator.compareKey() is called for each row so it is critical for performance. In your code the hasNullElement() could be called 4 times in the worse case. If you cache the result it can be called only twice. Yongqiang, any further comments? > NULL is not handled correctly in join > ------------------------------------- > > Key: HIVE-741 > URL: https://issues.apache.org/jira/browse/HIVE-741 > Project: Hadoop Hive > Issue Type: Bug > Reporter: Ning Zhang > Assignee: Amareshwari Sriramadasu > Attachments: patch-741-1.txt, patch-741-2.txt, patch-741.txt, > smbjoin_nulls.q.txt > > > With the following data in table input4_cb: > Key Value > ------ -------- > NULL 325 > 18 NULL > The following query: > {code} > select * from input4_cb a join input4_cb b on a.key = b.value; > {code} > returns the following result: > NULL 325 18 NULL > The correct result should be empty set. > When 'null' is replaced by '' it works. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.