[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-30 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1605:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Ning

> regression and improvements in handling NULLs in joins
> --
>
> Key: HIVE-1605
> URL: https://issues.apache.org/jira/browse/HIVE-1605
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1605.2.patch, HIVE-1605.3.patch, HIVE-1605.patch
>
>
> There are regressions in sort-merge map join after HIVE-741. There are a lot 
> of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
> maintained for each key to remember whether it is NULL. This takes too much 
> memory when the tables are large. 
> A second issu is in handling NULLs if the join keys are more than 1 column. 
> This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
> if all the columns are NULL. It should return false in match if any joined 
> value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-30 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1605:
-

Attachment: HIVE-1605.3.patch

Uploading hive-1605.3.patch. thanks amareshwari.

> regression and improvements in handling NULLs in joins
> --
>
> Key: HIVE-1605
> URL: https://issues.apache.org/jira/browse/HIVE-1605
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1605.2.patch, HIVE-1605.3.patch, HIVE-1605.patch
>
>
> There are regressions in sort-merge map join after HIVE-741. There are a lot 
> of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
> maintained for each key to remember whether it is NULL. This takes too much 
> memory when the tables are large. 
> A second issu is in handling NULLs if the join keys are more than 1 column. 
> This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
> if all the columns are NULL. It should return false in match if any joined 
> value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-29 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1605:
-

Attachment: HIVE-1605.2.patch

Thanks Amareshwari for the review. Attached HIVE-1605.2.patch address the 
issues.

> regression and improvements in handling NULLs in joins
> --
>
> Key: HIVE-1605
> URL: https://issues.apache.org/jira/browse/HIVE-1605
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1605.2.patch, HIVE-1605.patch
>
>
> There are regressions in sort-merge map join after HIVE-741. There are a lot 
> of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
> maintained for each key to remember whether it is NULL. This takes too much 
> memory when the tables are large. 
> A second issu is in handling NULLs if the join keys are more than 1 column. 
> This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
> if all the columns are NULL. It should return false in match if any joined 
> value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-29 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1605:
-

Attachment: HIVE-1605.patch

Passed all test except scriptfile1.q in TestMinimrCliDriver in hadoop 0,20. 
This test also failed on trunk. 

> regression and improvements in handling NULLs in joins
> --
>
> Key: HIVE-1605
> URL: https://issues.apache.org/jira/browse/HIVE-1605
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1605.patch
>
>
> There are regressions in sort-merge map join after HIVE-741. There are a lot 
> of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
> maintained for each key to remember whether it is NULL. This takes too much 
> memory when the tables are large. 
> A second issu is in handling NULLs if the join keys are more than 1 column. 
> This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
> if all the columns are NULL. It should return false in match if any joined 
> value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-29 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1605:
-

Status: Patch Available  (was: Open)

> regression and improvements in handling NULLs in joins
> --
>
> Key: HIVE-1605
> URL: https://issues.apache.org/jira/browse/HIVE-1605
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1605.patch
>
>
> There are regressions in sort-merge map join after HIVE-741. There are a lot 
> of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
> maintained for each key to remember whether it is NULL. This takes too much 
> memory when the tables are large. 
> A second issu is in handling NULLs if the join keys are more than 1 column. 
> This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
> if all the columns are NULL. It should return false in match if any joined 
> value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.