regression and improvements in handling NULLs in joins
------------------------------------------------------

                 Key: HIVE-1605
                 URL: https://issues.apache.org/jira/browse/HIVE-1605
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Ning Zhang
            Assignee: Ning Zhang


There are regressions in sort-merge map join after HIVE-741. There are a lot of 
OOM exceptions in SMBMapJoinOperator. This caused by the HashMap maintained for 
each key to remember whether it is NULL. This takes too much memory when the 
tables are large. 

A second issu is in handling NULLs if the join keys are more than 1 column. 
This appears in regular MapJoin as well as SMBMapJoin. The code only checks if 
all the columns are NULL. It should return false in match if any joined value 
is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to