[jira] Commented: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-29 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904088#action_12904088
 ] 

Amareshwari Sriramadasu commented on HIVE-1605:
---

MapJoinOperator.java still has a debug log. Otherwise, patch looks fine.

> regression and improvements in handling NULLs in joins
> --
>
> Key: HIVE-1605
> URL: https://issues.apache.org/jira/browse/HIVE-1605
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1605.2.patch, HIVE-1605.patch
>
>
> There are regressions in sort-merge map join after HIVE-741. There are a lot 
> of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
> maintained for each key to remember whether it is NULL. This takes too much 
> memory when the tables are large. 
> A second issu is in handling NULLs if the join keys are more than 1 column. 
> This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
> if all the columns are NULL. It should return false in match if any joined 
> value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-29 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1605:
-

Attachment: HIVE-1605.2.patch

Thanks Amareshwari for the review. Attached HIVE-1605.2.patch address the 
issues.

> regression and improvements in handling NULLs in joins
> --
>
> Key: HIVE-1605
> URL: https://issues.apache.org/jira/browse/HIVE-1605
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1605.2.patch, HIVE-1605.patch
>
>
> There are regressions in sort-merge map join after HIVE-741. There are a lot 
> of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
> maintained for each key to remember whether it is NULL. This takes too much 
> memory when the tables are large. 
> A second issu is in handling NULLs if the join keys are more than 1 column. 
> This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
> if all the columns are NULL. It should return false in match if any joined 
> value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-29 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904085#action_12904085
 ] 

Amareshwari Sriramadasu commented on HIVE-1605:
---

Ning, Thanks for looking into this. A couple of minor comments:
* Patch has many debug logs and commented code. do you want to remove them?
* Do you want to remove hasAllNulls method from AbstractMapJoinOperator.java?




> regression and improvements in handling NULLs in joins
> --
>
> Key: HIVE-1605
> URL: https://issues.apache.org/jira/browse/HIVE-1605
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1605.patch
>
>
> There are regressions in sort-merge map join after HIVE-741. There are a lot 
> of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
> maintained for each key to remember whether it is NULL. This takes too much 
> memory when the tables are large. 
> A second issu is in handling NULLs if the join keys are more than 1 column. 
> This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
> if all the columns are NULL. It should return false in match if any joined 
> value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-29 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1605:
-

Attachment: HIVE-1605.patch

Passed all test except scriptfile1.q in TestMinimrCliDriver in hadoop 0,20. 
This test also failed on trunk. 

> regression and improvements in handling NULLs in joins
> --
>
> Key: HIVE-1605
> URL: https://issues.apache.org/jira/browse/HIVE-1605
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1605.patch
>
>
> There are regressions in sort-merge map join after HIVE-741. There are a lot 
> of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
> maintained for each key to remember whether it is NULL. This takes too much 
> memory when the tables are large. 
> A second issu is in handling NULLs if the join keys are more than 1 column. 
> This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
> if all the columns are NULL. It should return false in match if any joined 
> value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1605) regression and improvements in handling NULLs in joins

2010-08-29 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1605:
-

Status: Patch Available  (was: Open)

> regression and improvements in handling NULLs in joins
> --
>
> Key: HIVE-1605
> URL: https://issues.apache.org/jira/browse/HIVE-1605
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1605.patch
>
>
> There are regressions in sort-merge map join after HIVE-741. There are a lot 
> of OOM exceptions in SMBMapJoinOperator. This caused by the HashMap 
> maintained for each key to remember whether it is NULL. This takes too much 
> memory when the tables are large. 
> A second issu is in handling NULLs if the join keys are more than 1 column. 
> This appears in regular MapJoin as well as SMBMapJoin. The code only checks 
> if all the columns are NULL. It should return false in match if any joined 
> value is NULL. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-471) A UDF for simple reflection

2010-08-29 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-471:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Edward

> A UDF for simple reflection
> ---
>
> Key: HIVE-471
> URL: https://issues.apache.org/jira/browse/HIVE-471
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
> HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, HIVE-471.6.patch.txt, 
> hive-471.diff
>
>
> There are many methods in java that are static and have no arguments or can 
> be invoked with one simple parameter. More complicated functions will require 
> a UDF but one generic one can work as a poor-mans UDF.
> {noformat}
> SELECT reflect("java.lang.String", "valueOf", 1), reflect("java.lang.String", 
> "isEmpty")
> FROM src LIMIT 1;
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1536) Add support for JDBC PreparedStatements

2010-08-29 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1536:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed.  Thanks Sean!


> Add support for JDBC PreparedStatements
> ---
>
> Key: HIVE-1536
> URL: https://issues.apache.org/jira/browse/HIVE-1536
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Affects Versions: 0.6.0
>Reporter: Sean Flatley
>Assignee: Sean Flatley
> Fix For: 0.7.0
>
> Attachments: all-tests-ant.log, HIVE-1536-2.patch, HIVE-1536-3.patch, 
> HIVE-1536-changes-2.txt, HIVE-1536-changes-3.txt, HIVE-1536-changes.txt, 
> HIVE-1536.patch, JdbcDriverTest-ant-2.log, JdbcDriverTest-ant.log, 
> TestJdbcDriver-ant-3.log
>
>
> As a result of a Sprint which had us using Pentaho Data Integration with the 
> Hive database we have updated the driver.  Many PreparedStatement methods 
> have been implemented.  A patch will be attached tomorrow with a summary of 
> changes.
> Note:  A checkout of Hive/trunk was performed and the TestJdbcDriver test 
> cased was run.  This was done before any modifications were made to the 
> checked out project.  The testResultSetMetaData failed:
> java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: 
> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
>   at 
> org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
>   at 
> org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData(TestJdbcDriver.java:530)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at junit.framework.TestCase.runTest(TestCase.java:154)
>   at junit.framework.TestCase.runBare(TestCase.java:127)
>   at junit.framework.TestResult$1.protect(TestResult.java:106)
>   at junit.framework.TestResult.runProtected(TestResult.java:124)
>   at junit.framework.TestResult.run(TestResult.java:109)
>   at junit.framework.TestCase.run(TestCase.java:118)
>   at junit.framework.TestSuite.runTest(TestSuite.java:208)
>   at junit.framework.TestSuite.run(TestSuite.java:203)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
> A co-worker did the same and the tests passed.  Both environments were Ubuntu 
> and Hadoop version 0.20.2.
> Tests added to the TestJdbcDriver by us were successful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-471) A UDF for simple reflection

2010-08-29 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903987#action_12903987
 ] 

Namit Jain commented on HIVE-471:
-

+1

will commit if the tests pass

> A UDF for simple reflection
> ---
>
> Key: HIVE-471
> URL: https://issues.apache.org/jira/browse/HIVE-471
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
> HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, HIVE-471.6.patch.txt, 
> hive-471.diff
>
>
> There are many methods in java that are static and have no arguments or can 
> be invoked with one simple parameter. More complicated functions will require 
> a UDF but one generic one can work as a poor-mans UDF.
> {noformat}
> SELECT reflect("java.lang.String", "valueOf", 1), reflect("java.lang.String", 
> "isEmpty")
> FROM src LIMIT 1;
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Running JUnit with eclipse on a fresh checkout of hive trunk fails with NPE

2010-08-29 Thread Maxim Veksler
Hi,

Trying to checkout some hive code here.
Ubuntu 64bit, Java 1.6 64bit, Eclipse 3.6.

For the following chain of events:
ma...@maxim-desktop:~/workspace$ svn co
http://svn.apache.org/repos/asf/hadoop/hive/trunk hadoop-hive-trunk
ma...@maxim-desktop:~/workspace$ cd hadoop-hive-trunk
ma...@maxim-desktop:~/workspace/hadoop-hive-trunk$ ant tar
ma...@maxim-desktop:~/workspace/hadoop-hive-trunk$ ant eclipse-files

>> Import new project in eclipse
>> Right click on hadoop-hive-trunk > Run As > JUnit Test

I get the following error:
java.lang.NullPointerException
 at
org.apache.hadoop.hive.ql.exec.TestExecDriver.(TestExecDriver.java:102)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at junit.framework.TestSuite.createTest(TestSuite.java:131)
 at junit.framework.TestSuite.addTestMethod(TestSuite.java:114)
at junit.framework.TestSuite.(TestSuite.java:75)
 at
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestLoader.getTest(JUnit3TestLoader.java:102)
at
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestLoader.loadTests(JUnit3TestLoader.java:59)
 at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:452)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
 at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
java.lang.ExceptionInInitializerError
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at junit.framework.TestSuite.createTest(TestSuite.java:131)
at junit.framework.TestSuite.addTestMethod(TestSuite.java:114)
 at junit.framework.TestSuite.(TestSuite.java:75)
at
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestLoader.getTest(JUnit3TestLoader.java:102)
 at
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestLoader.loadTests(JUnit3TestLoader.java:59)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:452)
 at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
 at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.lang.RuntimeException: Encountered throwable
 at
org.apache.hadoop.hive.ql.exec.TestExecDriver.(TestExecDriver.java:127)
... 13 more


What am I doing wrong? I would like to be able to run the hive unit test to
play a bit with the code.

Thank you,
Maxim.