[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins
[ https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1605: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Committed. Thanks Ning > regression and improvements in handling NULLs in joins > -- > > Key: HIVE-1605 > URL: https://issues.apache.org/jira/browse/HIVE-1605 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1605.2.patch, HIVE-1605.3.patch, HIVE-1605.patch > > > There are regressions in sort-merge map join after HIVE-741. There are a lot > of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap > maintained for each key to remember whether it is NULL. This takes too much > memory when the tables are large. > A second issu is in handling NULLs if the join keys are more than 1 column. > This appears in regular MapJoin as well as SMBMapJoin. The code only checks > if all the columns are NULL. It should return false in match if any joined > value is NULL. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins
[ https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1605: - Attachment: HIVE-1605.3.patch Uploading hive-1605.3.patch. thanks amareshwari. > regression and improvements in handling NULLs in joins > -- > > Key: HIVE-1605 > URL: https://issues.apache.org/jira/browse/HIVE-1605 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1605.2.patch, HIVE-1605.3.patch, HIVE-1605.patch > > > There are regressions in sort-merge map join after HIVE-741. There are a lot > of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap > maintained for each key to remember whether it is NULL. This takes too much > memory when the tables are large. > A second issu is in handling NULLs if the join keys are more than 1 column. > This appears in regular MapJoin as well as SMBMapJoin. The code only checks > if all the columns are NULL. It should return false in match if any joined > value is NULL. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins
[ https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1605: - Attachment: HIVE-1605.2.patch Thanks Amareshwari for the review. Attached HIVE-1605.2.patch address the issues. > regression and improvements in handling NULLs in joins > -- > > Key: HIVE-1605 > URL: https://issues.apache.org/jira/browse/HIVE-1605 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1605.2.patch, HIVE-1605.patch > > > There are regressions in sort-merge map join after HIVE-741. There are a lot > of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap > maintained for each key to remember whether it is NULL. This takes too much > memory when the tables are large. > A second issu is in handling NULLs if the join keys are more than 1 column. > This appears in regular MapJoin as well as SMBMapJoin. The code only checks > if all the columns are NULL. It should return false in match if any joined > value is NULL. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins
[ https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1605: - Attachment: HIVE-1605.patch Passed all test except scriptfile1.q in TestMinimrCliDriver in hadoop 0,20. This test also failed on trunk. > regression and improvements in handling NULLs in joins > -- > > Key: HIVE-1605 > URL: https://issues.apache.org/jira/browse/HIVE-1605 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1605.patch > > > There are regressions in sort-merge map join after HIVE-741. There are a lot > of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap > maintained for each key to remember whether it is NULL. This takes too much > memory when the tables are large. > A second issu is in handling NULLs if the join keys are more than 1 column. > This appears in regular MapJoin as well as SMBMapJoin. The code only checks > if all the columns are NULL. It should return false in match if any joined > value is NULL. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins
[ https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1605: - Status: Patch Available (was: Open) > regression and improvements in handling NULLs in joins > -- > > Key: HIVE-1605 > URL: https://issues.apache.org/jira/browse/HIVE-1605 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1605.patch > > > There are regressions in sort-merge map join after HIVE-741. There are a lot > of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap > maintained for each key to remember whether it is NULL. This takes too much > memory when the tables are large. > A second issu is in handling NULLs if the join keys are more than 1 column. > This appears in regular MapJoin as well as SMBMapJoin. The code only checks > if all the columns are NULL. It should return false in match if any joined > value is NULL. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.