date:20100812

[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-12 Thread Joydeep Sen Sarma (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898080#action_12898080
 ] 

Joydeep Sen Sarma commented on HIVE-1530:
-

ok - that makes sense. leave a hive-site.xml.sample and 
hive-log4j.properties.example in the conf/. i agree with Ed's point about how 
difficult it is to figure out hadoop config variables now and hadoop is worse 
off for it. commands are nice - but having a template is better. it's easy to 
clone an example file and append/modify the default description to add site 
specific notes. and one can grep.

we could autogenerate the hive-site.xml.sample from config variable metadata in 
the source code. that would keep us in sync with code.

> Include hive-default.xml and hive-log4j.properties in hive-common JAR
> -
>
> Key: HIVE-1530
> URL: https://issues.apache.org/jira/browse/HIVE-1530
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Carl Steinbach
>
> hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
> and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
> hive-default.xml file that currently sits in the conf/ directory should be 
> removed.
> Motivations for this change:
> * We explicitly tell users that they should never modify hive-default.xml yet 
> give them the opportunity to do so by placing the file in the conf dir.
> * Many users are familiar with the Hadoop configuration mechanism that does 
> not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
> assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1528) JSON UDTF function

2010-08-12 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1528:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed.  Thanks Ning!


> JSON UDTF function
> --
>
> Key: HIVE-1528
> URL: https://issues.apache.org/jira/browse/HIVE-1528
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1528.2.patch, HIVE-1528.patch
>
>
> Currently the only way to evaluate a path expression on a JSON object is 
> through get_json_object. If there are many fields in the JSON object need to 
> be extract, we have to call this UDF multiple times. 
> There are many use cases that get_json_object needs to be called many times 
> in one query to convert the JSON object to a relational schema. It would be 
> much desirable if we have a JSON UDTF that supports the following syntax:
> {code}
> select a.id, b.*
> from a lateral view json_table(a.json_object, '$.f1',  '$.f2', ..., '$.fn') b 
> as f1, f2, ..., fn
> {code}
> where the json_table function only scans the json_object once and return a 
> set of tuple of (f1, f2,..., fn). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1537) Allow users to specify LOCATION in CREATE DATABASE statement

2010-08-12 Thread Carl Steinbach (JIRA)

Allow users to specify LOCATION in CREATE DATABASE statement


 Key: HIVE-1537
 URL: https://issues.apache.org/jira/browse/HIVE-1537
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Carl Steinbach




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1535) alter partition should throw exception if the specified partition does not exist.

2010-08-12 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1535:
---

Status: Patch Available  (was: Open)

> alter partition should throw exception if the specified partition does not 
> exist.
> -
>
> Key: HIVE-1535
> URL: https://issues.apache.org/jira/browse/HIVE-1535
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1535.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1535) alter partition should throw exception if the specified partition does not exist.

2010-08-12 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1535:
---

Attachment: hive-1535.1.patch

No negative tests included because hive is using local meta store, and throw 
exception if the partition does not exist. So there is no problem when running 
with local meta store.

> alter partition should throw exception if the specified partition does not 
> exist.
> -
>
> Key: HIVE-1535
> URL: https://issues.apache.org/jira/browse/HIVE-1535
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1535.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1428) ALTER TABLE ADD PARTITION fails with a remote Thrift metastore

2010-08-12 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1428:
-

Status: Patch Available  (was: Reopened)

> ALTER TABLE ADD PARTITION fails with a remote Thrift metastore
> --
>
> Key: HIVE-1428
> URL: https://issues.apache.org/jira/browse/HIVE-1428
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Paul Yang
>Assignee: Pradeep Kamath
> Fix For: 0.6.0
>
> Attachments: HIVE-1428-0.6.0-patch.txt, HIVE-1428-2.patch, 
> HIVE-1428-3.patch, HIVE-1428.patch, TestHiveMetaStoreRemote.java
>
>
> If the hive cli is configured to use a remote metastore, ALTER TABLE ... ADD 
> PARTITION commands will fail with an error similar to the following:
> [prade...@chargesize:~/dev/howl]hive --auxpath ult-serde.jar -e "ALTER TABLE 
> mytable add partition(datestamp = '20091101', srcid = '10',action) location 
> '/user/pradeepk/mytable/20091101/10';"
> 10/06/16 17:08:59 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found 
> in the classpath. Usage of hadoop-site.xml is deprecated. Instead use 
> core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of 
> core-default.xml, mapred-default.xml and hdfs-default.xml respectively
> Hive history 
> file=/tmp/pradeepk/hive_job_log_pradeepk_201006161709_1934304805.txt
> FAILED: Error in metadata: org.apache.thrift.TApplicationException: 
> get_partition failed: unknown result
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> [prade...@chargesize:~/dev/howl]
> This is due to a check that tries to retrieve the partition to see if it 
> exists. If it does not, an attempt is made to pass a null value from the 
> metastore. Since thrift does not support null return values, an exception is 
> thrown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1428) ALTER TABLE ADD PARTITION fails with a remote Thrift metastore

2010-08-12 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1428:
-

Fix Version/s: 0.6.0
   (was: 0.7.0)
Affects Version/s: 0.5.0
   (was: 0.6.0)
   (was: 0.7.0)

> ALTER TABLE ADD PARTITION fails with a remote Thrift metastore
> --
>
> Key: HIVE-1428
> URL: https://issues.apache.org/jira/browse/HIVE-1428
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Paul Yang
>Assignee: Pradeep Kamath
> Fix For: 0.6.0
>
> Attachments: HIVE-1428-0.6.0-patch.txt, HIVE-1428-2.patch, 
> HIVE-1428-3.patch, HIVE-1428.patch, TestHiveMetaStoreRemote.java
>
>
> If the hive cli is configured to use a remote metastore, ALTER TABLE ... ADD 
> PARTITION commands will fail with an error similar to the following:
> [prade...@chargesize:~/dev/howl]hive --auxpath ult-serde.jar -e "ALTER TABLE 
> mytable add partition(datestamp = '20091101', srcid = '10',action) location 
> '/user/pradeepk/mytable/20091101/10';"
> 10/06/16 17:08:59 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found 
> in the classpath. Usage of hadoop-site.xml is deprecated. Instead use 
> core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of 
> core-default.xml, mapred-default.xml and hdfs-default.xml respectively
> Hive history 
> file=/tmp/pradeepk/hive_job_log_pradeepk_201006161709_1934304805.txt
> FAILED: Error in metadata: org.apache.thrift.TApplicationException: 
> get_partition failed: unknown result
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> [prade...@chargesize:~/dev/howl]
> This is due to a check that tries to retrieve the partition to see if it 
> exists. If it does not, an attempt is made to pass a null value from the 
> metastore. Since thrift does not support null return values, an exception is 
> thrown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (HIVE-1428) ALTER TABLE ADD PARTITION fails with a remote Thrift metastore

2010-08-12 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reopened HIVE-1428:
--


We need to backport this fix to 0.6.0

> ALTER TABLE ADD PARTITION fails with a remote Thrift metastore
> --
>
> Key: HIVE-1428
> URL: https://issues.apache.org/jira/browse/HIVE-1428
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
>Assignee: Pradeep Kamath
> Fix For: 0.7.0
>
> Attachments: HIVE-1428-0.6.0-patch.txt, HIVE-1428-2.patch, 
> HIVE-1428-3.patch, HIVE-1428.patch, TestHiveMetaStoreRemote.java
>
>
> If the hive cli is configured to use a remote metastore, ALTER TABLE ... ADD 
> PARTITION commands will fail with an error similar to the following:
> [prade...@chargesize:~/dev/howl]hive --auxpath ult-serde.jar -e "ALTER TABLE 
> mytable add partition(datestamp = '20091101', srcid = '10',action) location 
> '/user/pradeepk/mytable/20091101/10';"
> 10/06/16 17:08:59 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found 
> in the classpath. Usage of hadoop-site.xml is deprecated. Instead use 
> core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of 
> core-default.xml, mapred-default.xml and hdfs-default.xml respectively
> Hive history 
> file=/tmp/pradeepk/hive_job_log_pradeepk_201006161709_1934304805.txt
> FAILED: Error in metadata: org.apache.thrift.TApplicationException: 
> get_partition failed: unknown result
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> [prade...@chargesize:~/dev/howl]
> This is due to a check that tries to retrieve the partition to see if it 
> exists. If it does not, an attempt is made to pass a null value from the 
> metastore. Since thrift does not support null return values, an exception is 
> thrown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1428) ALTER TABLE ADD PARTITION fails with a remote Thrift metastore

2010-08-12 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1428:
-

Attachment: HIVE-1428-0.6.0-patch.txt

> ALTER TABLE ADD PARTITION fails with a remote Thrift metastore
> --
>
> Key: HIVE-1428
> URL: https://issues.apache.org/jira/browse/HIVE-1428
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
>Assignee: Pradeep Kamath
> Fix For: 0.7.0
>
> Attachments: HIVE-1428-0.6.0-patch.txt, HIVE-1428-2.patch, 
> HIVE-1428-3.patch, HIVE-1428.patch, TestHiveMetaStoreRemote.java
>
>
> If the hive cli is configured to use a remote metastore, ALTER TABLE ... ADD 
> PARTITION commands will fail with an error similar to the following:
> [prade...@chargesize:~/dev/howl]hive --auxpath ult-serde.jar -e "ALTER TABLE 
> mytable add partition(datestamp = '20091101', srcid = '10',action) location 
> '/user/pradeepk/mytable/20091101/10';"
> 10/06/16 17:08:59 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found 
> in the classpath. Usage of hadoop-site.xml is deprecated. Instead use 
> core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of 
> core-default.xml, mapred-default.xml and hdfs-default.xml respectively
> Hive history 
> file=/tmp/pradeepk/hive_job_log_pradeepk_201006161709_1934304805.txt
> FAILED: Error in metadata: org.apache.thrift.TApplicationException: 
> get_partition failed: unknown result
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> [prade...@chargesize:~/dev/howl]
> This is due to a check that tries to retrieve the partition to see if it 
> exists. If it does not, an attempt is made to pass a null value from the 
> metastore. Since thrift does not support null return values, an exception is 
> thrown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1536) Add support for JDBC PreparedStatements

2010-08-12 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1536:
-

Summary: Add support for JDBC PreparedStatements  (was: Adding 
implementation to JDBC driver)
Component/s: Drivers

> Add support for JDBC PreparedStatements
> ---
>
> Key: HIVE-1536
> URL: https://issues.apache.org/jira/browse/HIVE-1536
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Sean Flatley
>
> As a result of a Sprint which had us using Pentaho Data Integration with the 
> Hive database we have updated the driver.  Many PreparedStatement methods 
> have been implemented.  A patch will be attached tomorrow with a summary of 
> changes.
> Note:  A checkout of Hive/trunk was performed and the TestJdbcDriver test 
> cased was run.  This was done before any modifications were made to the 
> checked out project.  The testResultSetMetaData failed:
> java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: 
> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
>   at 
> org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
>   at 
> org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData(TestJdbcDriver.java:530)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at junit.framework.TestCase.runTest(TestCase.java:154)
>   at junit.framework.TestCase.runBare(TestCase.java:127)
>   at junit.framework.TestResult$1.protect(TestResult.java:106)
>   at junit.framework.TestResult.runProtected(TestResult.java:124)
>   at junit.framework.TestResult.run(TestResult.java:109)
>   at junit.framework.TestCase.run(TestCase.java:118)
>   at junit.framework.TestSuite.runTest(TestSuite.java:208)
>   at junit.framework.TestSuite.run(TestSuite.java:203)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
> A co-worker did the same and the tests passed.  Both environments were Ubuntu 
> and Hadoop version 0.20.2.
> Tests added to the TestJdbcDriver by us were successful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-12 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898040#action_12898040
 ] 

Carl Steinbach commented on HIVE-1530:
--

@Ed: In my opinion the ideal solution is to get rid of the hive-default.xml 
file entirely and rely on the default values that appear in HiveConf. We can 
add a {{describe property }} command that prints out a description of the 
property, and also add checks that protect the {{hive.*}} configuration 
property namespace (i.e. you can't set a {{hive.*}} property unless it is 
defined in HiveConf).  Another advantage of this approach is that we don't have 
to worry about hive-default.xml falling out of sync with HiveConf, e.g. a user 
upgrades to a new version of Hive but continues to use an older copy of 
hive-default.xml.


> Include hive-default.xml and hive-log4j.properties in hive-common JAR
> -
>
> Key: HIVE-1530
> URL: https://issues.apache.org/jira/browse/HIVE-1530
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Carl Steinbach
>
> hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
> and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
> hive-default.xml file that currently sits in the conf/ directory should be 
> removed.
> Motivations for this change:
> * We explicitly tell users that they should never modify hive-default.xml yet 
> give them the opportunity to do so by placing the file in the conf dir.
> * Many users are familiar with the Hadoop configuration mechanism that does 
> not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
> assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1528) JSON UDTF function

2010-08-12 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898039#action_12898039
 ] 

John Sichi commented on HIVE-1528:
--

Will commit if tests pass.


> JSON UDTF function
> --
>
> Key: HIVE-1528
> URL: https://issues.apache.org/jira/browse/HIVE-1528
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1528.2.patch, HIVE-1528.patch
>
>
> Currently the only way to evaluate a path expression on a JSON object is 
> through get_json_object. If there are many fields in the JSON object need to 
> be extract, we have to call this UDF multiple times. 
> There are many use cases that get_json_object needs to be called many times 
> in one query to convert the JSON object to a relational schema. It would be 
> much desirable if we have a JSON UDTF that supports the following syntax:
> {code}
> select a.id, b.*
> from a lateral view json_table(a.json_object, '$.f1',  '$.f2', ..., '$.fn') b 
> as f1, f2, ..., fn
> {code}
> where the json_table function only scans the json_object once and return a 
> set of tuple of (f1, f2,..., fn). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-12 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898038#action_12898038
 ] 

Carl Steinbach commented on HIVE-1530:
--

bq. but users may want to modify the log4.properties files. how would do they 
do that in the new arrangement?

Hive uses a classloader to get the hive-log4j and hive-exec-log4j property 
resources. If a user wants to override the log4j properties that are bundled 
with the JAR they only need to make sure that their copy appears first on the 
CLASSPATH.



> Include hive-default.xml and hive-log4j.properties in hive-common JAR
> -
>
> Key: HIVE-1530
> URL: https://issues.apache.org/jira/browse/HIVE-1530
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Carl Steinbach
>
> hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
> and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
> hive-default.xml file that currently sits in the conf/ directory should be 
> removed.
> Motivations for this change:
> * We explicitly tell users that they should never modify hive-default.xml yet 
> give them the opportunity to do so by placing the file in the conf dir.
> * Many users are familiar with the Hadoop configuration mechanism that does 
> not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
> assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1536) Adding implementation to JDBC driver

2010-08-12 Thread Sean Flatley (JIRA)

Adding implementation to JDBC driver


 Key: HIVE-1536
 URL: https://issues.apache.org/jira/browse/HIVE-1536
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Sean Flatley


As a result of a Sprint which had us using Pentaho Data Integration with the 
Hive database we have updated the driver.  Many PreparedStatement methods have 
been implemented.  A patch will be attached tomorrow with a summary of changes.


Note:  A checkout of Hive/trunk was performed and the TestJdbcDriver test cased 
was run.  This was done before any modifications were made to the checked out 
project.  The testResultSetMetaData failed:

java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: 
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
at 
org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
at 
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData(TestJdbcDriver.java:530)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:154)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)

A co-worker did the same and the tests passed.  Both environments were Ubuntu 
and Hadoop version 0.20.2.

Tests added to the TestJdbcDriver by us were successful.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1528) JSON UDTF function

2010-08-12 Thread Paul Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898034#action_12898034
 ] 

Paul Yang commented on HIVE-1528:
-

Looks good +1

> JSON UDTF function
> --
>
> Key: HIVE-1528
> URL: https://issues.apache.org/jira/browse/HIVE-1528
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1528.2.patch, HIVE-1528.patch
>
>
> Currently the only way to evaluate a path expression on a JSON object is 
> through get_json_object. If there are many fields in the JSON object need to 
> be extract, we have to call this UDF multiple times. 
> There are many use cases that get_json_object needs to be called many times 
> in one query to convert the JSON object to a relational schema. It would be 
> much desirable if we have a JSON UDTF that supports the following syntax:
> {code}
> select a.id, b.*
> from a lateral view json_table(a.json_object, '$.f1',  '$.f2', ..., '$.fn') b 
> as f1, f2, ..., fn
> {code}
> where the json_table function only scans the json_object once and return a 
> set of tuple of (f1, f2,..., fn). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1528) JSON UDTF function

2010-08-12 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1528:
-

Attachment: HIVE-1528.2.patch

Based on offline discussion with Paul, added a new unit test of putting 
json_tuple in the select clause. Also removed temporary changes in UDFJson.java.

> JSON UDTF function
> --
>
> Key: HIVE-1528
> URL: https://issues.apache.org/jira/browse/HIVE-1528
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1528.2.patch, HIVE-1528.patch
>
>
> Currently the only way to evaluate a path expression on a JSON object is 
> through get_json_object. If there are many fields in the JSON object need to 
> be extract, we have to call this UDF multiple times. 
> There are many use cases that get_json_object needs to be called many times 
> in one query to convert the JSON object to a relational schema. It would be 
> much desirable if we have a JSON UDTF that supports the following syntax:
> {code}
> select a.id, b.*
> from a lateral view json_table(a.json_object, '$.f1',  '$.f2', ..., '$.fn') b 
> as f1, f2, ..., fn
> {code}
> where the json_table function only scans the json_object once and return a 
> set of tuple of (f1, f2,..., fn). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1531) Make Hive build work with Ivy versions < 2.1.0

2010-08-12 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898032#action_12898032
 ] 

John Sichi commented on HIVE-1531:
--

+1.  Will commit when tests pass.


> Make Hive build work with Ivy versions < 2.1.0
> --
>
> Key: HIVE-1531
> URL: https://issues.apache.org/jira/browse/HIVE-1531
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1531.patch.txt
>
>
> Many projects in the Hadoop ecosystem still use Ivy 2.0.0 (including Hadoop 
> and Pig),
> yet Hive requires version 2.1.0. Ordinarily this would not be a problem, but 
> many users
> have a copy of an older version of Ivy in their $ANT_HOME directory, and this 
> copy will
> always get picked up in preference to what the Hive build downloads for 
> itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1522) replace columns should prohibit using partition column names.

2010-08-12 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang reassigned HIVE-1522:
--

Assignee: He Yongqiang

> replace columns should prohibit using partition column names.
> -
>
> Key: HIVE-1522
> URL: https://issues.apache.org/jira/browse/HIVE-1522
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>
> create table src_part_w(key int , value string) partitioned by (ds string, hr 
> int);
> alter table src_part_w  replace columns (key int, ds string, hr int, value 
> string);
> should not be allowed. Once the "alter table replace columns ..." is done, 
> all commands on this table will fail. And not able to change the schema back.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1532) Replace globStatus with listStatus inside Hive.java's replaceFiles.

2010-08-12 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang reassigned HIVE-1532:
--

Assignee: He Yongqiang

> Replace globStatus with listStatus inside Hive.java's replaceFiles.
> ---
>
> Key: HIVE-1532
> URL: https://issues.apache.org/jira/browse/HIVE-1532
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>
> globStatus expects a regular expression,  so if there is special characters 
> (like '{' , '[') in the filepath, this function will fail.
> We should be able to replace this call with listStatus easily since we are 
> not passing regex to replaceFiles(). The only places replaceFiles is called 
> is in loadPartition and Table's replaceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1535) alter partition should throw exception if the specified partition does not exist.

2010-08-12 Thread He Yongqiang (JIRA)

alter partition should throw exception if the specified partition does not 
exist.
-

 Key: HIVE-1535
 URL: https://issues.apache.org/jira/browse/HIVE-1535
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1515) archive is not working when multiple partitions inside one table are archived.

2010-08-12 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1515:
---

Assignee: (was: He Yongqiang)

> archive is not working when multiple partitions inside one table are archived.
> --
>
> Key: HIVE-1515
> URL: https://issues.apache.org/jira/browse/HIVE-1515
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: He Yongqiang
> Attachments: hive-1515.1.patch, hive-1515.2.patch
>
>
> set hive.exec.compress.output = true;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size=256;
> set mapred.min.split.size.per.node=256;
> set mapred.min.split.size.per.rack=256;
> set mapred.max.split.size=256;
> set hive.archive.enabled = true;
> drop table combine_3_srcpart_seq_rc;
> create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
> (ds string, hr string) stored as sequencefile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="00") select * from src;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="001") select * from src;
> ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds="2010-08-03", 
> hr="00");
> ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds="2010-08-03", 
> hr="001");
> select key, value, ds, hr from combine_3_srcpart_seq_rc where ds="2010-08-03" 
> order by key, hr limit 30;
> drop table combine_3_srcpart_seq_rc;
> will fail.
> java.io.IOException: Invalid file name: 
> har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
>  in 
> har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har
> The reason it fails is because:
> there are 2 input paths (one for each partition) for the above query:
> 1): 
> har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00
> 2): 
> har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
> But when doing path.getFileSystem() for these 2 input paths. they both return 
> same one file system instance which points the first caller, in this case 
> which is 
> har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har
> The reason here is Hadoop's FileSystem has a global cache, and when trying to 
> load a FileSystem instance from a given path, it only take the path's scheme 
> and username to lookup the cache. So when we do Path.getFileSystem for the 
> second har path, it actually returns the file system handle for the first 
> path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-12 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898030#action_12898030
 ] 

Edward Capriolo commented on HIVE-1530:
---

I like the default xml. Hive has many undocumented options, new ones are being 
added often. Are end users going to know which jar the default.xml are in? 
Users want to extracting a jar just to get the conf out of it to read the 
description of the setting.

As for what hadoop does...I personally find it annoying to have navigate to 
hadoop/src/mapred/mapred-default.xml or to hadoop/src/hdfs/hdfs-default.xml to 
figure out what options I have for settings. So i do not really thing we should 
just do it to be like hadoop it it makes peoples life harder.

If anything please keep it as hive-site.xml.sample.


> Include hive-default.xml and hive-log4j.properties in hive-common JAR
> -
>
> Key: HIVE-1530
> URL: https://issues.apache.org/jira/browse/HIVE-1530
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Carl Steinbach
>
> hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
> and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
> hive-default.xml file that currently sits in the conf/ directory should be 
> removed.
> Motivations for this change:
> * We explicitly tell users that they should never modify hive-default.xml yet 
> give them the opportunity to do so by placing the file in the conf dir.
> * Many users are familiar with the Hadoop configuration mechanism that does 
> not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
> assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1515) archive is not working when multiple partitions inside one table are archived.

2010-08-12 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1515:
---

Attachment: hive-1515.2.patch

Attache a possible fix.

Talked with Namit and Paul this afternoon about this issue. Actually there is 
config which can disable FileSystem cache: fs.%s.impl.disable.cache . where %s 
is the filesystem schema, for archive, it's har.

So if you set "fs.har.impl.disable.cache" to false, the archive will 
automatically work. This should be the clean way to fix this issue.
In order to do this, you need to apply 
https://issues.apache.org/jira/browse/HADOOP-6231 if your hadoop does not 
include the code to disable FileSystem cache.

> archive is not working when multiple partitions inside one table are archived.
> --
>
> Key: HIVE-1515
> URL: https://issues.apache.org/jira/browse/HIVE-1515
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1515.1.patch, hive-1515.2.patch
>
>
> set hive.exec.compress.output = true;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size=256;
> set mapred.min.split.size.per.node=256;
> set mapred.min.split.size.per.rack=256;
> set mapred.max.split.size=256;
> set hive.archive.enabled = true;
> drop table combine_3_srcpart_seq_rc;
> create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
> (ds string, hr string) stored as sequencefile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="00") select * from src;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="001") select * from src;
> ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds="2010-08-03", 
> hr="00");
> ALTER TABLE combine_3_srcpart_seq_rc ARCHIVE PARTITION (ds="2010-08-03", 
> hr="001");
> select key, value, ds, hr from combine_3_srcpart_seq_rc where ds="2010-08-03" 
> order by key, hr limit 30;
> drop table combine_3_srcpart_seq_rc;
> will fail.
> java.io.IOException: Invalid file name: 
> har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
>  in 
> har:/data/users/heyongqiang/hive-trunk-clean/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har
> The reason it fails is because:
> there are 2 input paths (one for each partition) for the above query:
> 1): 
> har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00
> 2): 
> har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001/data.har/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=001
> But when doing path.getFileSystem() for these 2 input paths. they both return 
> same one file system instance which points the first caller, in this case 
> which is 
> har:/Users/heyongqiang/Documents/workspace/Hive-Index/build/ql/test/data/warehouse/combine_3_srcpart_seq_rc/ds=2010-08-03/hr=00/data.har
> The reason here is Hadoop's FileSystem has a global cache, and when trying to 
> load a FileSystem instance from a given path, it only take the path's scheme 
> and username to lookup the cache. So when we do Path.getFileSystem for the 
> second har path, it actually returns the file system handle for the first 
> path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1529) Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.

2010-08-12 Thread Pierre Huyn (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898022#action_12898022
 ] 

Pierre Huyn commented on HIVE-1529:
---

Hi Mayank,

Thanks for reviewing. Please bear with me, as this is my first time. I am 
looking at the checkstyle-errors.html file but I cannot find the problems you 
reported. The only thing I found is "File contains tab characters (this is the 
first instance)." on line 177.

Are there other log files I need to look at to find style errors? Are tab 
characters now allowed?

Regards
--- Pierre



> Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1529.1.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898012#action_12898012
 ] 

John Sichi commented on HIVE-1293:
--

Also, as a followup, need to add client info such as hostname, process ID to 
SHOW LOCKS.


> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898010#action_12898010
 ] 

John Sichi commented on HIVE-1293:
--

After seeing some other issues, had a chat with Namit about semantics; here's 
what we worked out.

* Normally, locks should only be held for duration of statement execution.
* However, LOCK TABLE should take a global lock (not tied to any particular 
session or statement).
* UNLOCK TABLE should remove both kinds of lock (statement-level and global).  
Likewise, SHOW LOCKS shows all.
* For fetching results, we'll need a parameter to control whether a dirty read 
is possible.  Normally, this is not an issue since we're fetching from saved 
temp results, but when using select * from t to fetch directly from the 
original table, this behavior makes a difference.  To prevent dirty reads, 
we'll need the statement-level lock to span the duration of the fetch.

To avoid leaks, we need to make sure that once we create a ZooKeeper client, we 
always close it.


> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1529) Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.

2010-08-12 Thread Mayank Lahiri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898003#action_12898003
 ] 

Mayank Lahiri commented on HIVE-1529:
-

Hi Pierre,

The numerical results appear to be accurate. A couple of comments about the 
code:

(1) Run "ant checkstyle" and looks at the formatting errors for your file in 
the build/checkstyle/checkstyle-errors.html file. In particular, remove 
commented lines like #160 of GenericUDAFCovariance.java, and newline-elses like 
line #214, unnecessary wraps #210-211

(2) Is there any reason for accepting string arguments in the Resolver class? 
If the user has a numeric value as a string, they can simply (CAST val AS 
double) in the query. As it stands right now, passing junk strings as one of 
the input expressions causes a return value of NULL and a silent exception that 
is only visible in the log file. It might be better to simply not accept STRING 
types in the resolver, as in GenericUDAFHistogramNumeric.java. This would also 
mean that you don't have to test for a NumberFormatException in the iterate() 
method -- line #263 of GenericUDAFCovariance.java.

(3) Please add at least a little extended function info, line #59, see 
GenericUDAFHistogramNumeric.java or GenericUDAFnGrams.java for an example.

> Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1529.1.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1528) JSON UDTF function

2010-08-12 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang reassigned HIVE-1528:


Assignee: Ning Zhang

> JSON UDTF function
> --
>
> Key: HIVE-1528
> URL: https://issues.apache.org/jira/browse/HIVE-1528
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1528.patch
>
>
> Currently the only way to evaluate a path expression on a JSON object is 
> through get_json_object. If there are many fields in the JSON object need to 
> be extract, we have to call this UDF multiple times. 
> There are many use cases that get_json_object needs to be called many times 
> in one query to convert the JSON object to a relational schema. It would be 
> much desirable if we have a JSON UDTF that supports the following syntax:
> {code}
> select a.id, b.*
> from a lateral view json_table(a.json_object, '$.f1',  '$.f2', ..., '$.fn') b 
> as f1, f2, ..., fn
> {code}
> where the json_table function only scans the json_object once and return a 
> set of tuple of (f1, f2,..., fn). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-12 Thread Joydeep Sen Sarma (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897993#action_12897993
 ] 

Joydeep Sen Sarma commented on HIVE-1530:
-

removing the .xml files makes sense.

but users may want to modify the log4.properties files. how would do they do 
that in the new arrangement?

> Include hive-default.xml and hive-log4j.properties in hive-common JAR
> -
>
> Key: HIVE-1530
> URL: https://issues.apache.org/jira/browse/HIVE-1530
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Carl Steinbach
>
> hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
> and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
> hive-default.xml file that currently sits in the conf/ directory should be 
> removed.
> Motivations for this change:
> * We explicitly tell users that they should never modify hive-default.xml yet 
> give them the opportunity to do so by placing the file in the conf dir.
> * Many users are familiar with the Hadoop configuration mechanism that does 
> not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
> assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1528) JSON UDTF function

2010-08-12 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1528:
-

   Status: Patch Available  (was: Open)
Affects Version/s: 0.7.0
Fix Version/s: 0.7.0

> JSON UDTF function
> --
>
> Key: HIVE-1528
> URL: https://issues.apache.org/jira/browse/HIVE-1528
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1528.patch
>
>
> Currently the only way to evaluate a path expression on a JSON object is 
> through get_json_object. If there are many fields in the JSON object need to 
> be extract, we have to call this UDF multiple times. 
> There are many use cases that get_json_object needs to be called many times 
> in one query to convert the JSON object to a relational schema. It would be 
> much desirable if we have a JSON UDTF that supports the following syntax:
> {code}
> select a.id, b.*
> from a lateral view json_table(a.json_object, '$.f1',  '$.f2', ..., '$.fn') b 
> as f1, f2, ..., fn
> {code}
> where the json_table function only scans the json_object once and return a 
> set of tuple of (f1, f2,..., fn). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1528) JSON UDTF function

2010-08-12 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1528:
-

Attachment: HIVE-1528.patch

> JSON UDTF function
> --
>
> Key: HIVE-1528
> URL: https://issues.apache.org/jira/browse/HIVE-1528
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Ning Zhang
> Attachments: HIVE-1528.patch
>
>
> Currently the only way to evaluate a path expression on a JSON object is 
> through get_json_object. If there are many fields in the JSON object need to 
> be extract, we have to call this UDF multiple times. 
> There are many use cases that get_json_object needs to be called many times 
> in one query to convert the JSON object to a relational schema. It would be 
> much desirable if we have a JSON UDTF that supports the following syntax:
> {code}
> select a.id, b.*
> from a lateral view json_table(a.json_object, '$.f1',  '$.f2', ..., '$.fn') b 
> as f1, f2, ..., fn
> {code}
> where the json_table function only scans the json_object once and return a 
> set of tuple of (f1, f2,..., fn). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897988#action_12897988
 ] 

John Sichi commented on HIVE-1293:
--

Here's a scenario which is not working correctly.  (Tested with thrift server 
plus JDBC clients.)

Existing table foo.

Client 1:  LOCK TABLE foo EXCLUSIVE;

Client 2:  DROP TABLE foo;

According to the doc, the DROP TABLE should fail, but it succeeds.  Same is 
true for LOAD DATA.  Probably the same reason in both cases:  for these 
commands we don't register the output in the PREHOOK (only the POSTHOOK).  
INSERT is getting blocked correctly since it's in the PREHOOK.



> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams

2010-08-12 Thread Mayank Lahiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1518:


Status: Patch Available  (was: Open)

> context_ngrams() UDAF for estimating top-k contextual n-grams
> -
>
> Key: HIVE-1518
> URL: https://issues.apache.org/jira/browse/HIVE-1518
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1518.1.patch, HIVE-1518.2.patch
>
>
> Create a new context_ngrams() function that generalizes the ngrams() UDAF to 
> allow the user to specify context around n-grams. The analogy is 
> "fill-in-the-blanks", and is best illustrated with an example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM 
> twitter;
> will estimate the top-300 words that follow the phrase "i love" in a database 
> of tweets. The position of the null(s) specifies where to generate the n-gram 
> from, and can be placed anywhere. For example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", 
> "hate", null), 300) FROM twitter;
> will estimate the top-300 word-pairs that fill in the blanks specified by 
> null.
> POSSIBLE USES:
> 1. Pre-computing search lookaheads
> 2. Sentiment analysis for products or entities -- e.g., querying with context 
> = array("twitter", "is", null)
> 3. Navigation path analysis in URL databases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams

2010-08-12 Thread Mayank Lahiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1518:


Attachment: HIVE-1518.2.patch

> context_ngrams() UDAF for estimating top-k contextual n-grams
> -
>
> Key: HIVE-1518
> URL: https://issues.apache.org/jira/browse/HIVE-1518
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1518.1.patch, HIVE-1518.2.patch
>
>
> Create a new context_ngrams() function that generalizes the ngrams() UDAF to 
> allow the user to specify context around n-grams. The analogy is 
> "fill-in-the-blanks", and is best illustrated with an example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM 
> twitter;
> will estimate the top-300 words that follow the phrase "i love" in a database 
> of tweets. The position of the null(s) specifies where to generate the n-gram 
> from, and can be placed anywhere. For example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", 
> "hate", null), 300) FROM twitter;
> will estimate the top-300 word-pairs that fill in the blanks specified by 
> null.
> POSSIBLE USES:
> 1. Pre-computing search lookaheads
> 2. Sentiment analysis for products or entities -- e.g., querying with context 
> = array("twitter", "is", null)
> 3. Navigation path analysis in URL databases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1529) Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.

2010-08-12 Thread Pierre Huyn (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Huyn updated HIVE-1529:
--

Tags: ANSI SQL covariance aggregation function  (was: covariance 
aggregation function)

> Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1529.1.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1293:
-

Status: Open  (was: Patch Available)

> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1529) Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.

2010-08-12 Thread Pierre Huyn (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Huyn updated HIVE-1529:
--

Summary: Add ANSI SQL covariance aggregate functions: covar_pop and 
covar_samp.  (was: Add covariance aggregate function covar_pop and covar_samp)

> Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1529.1.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897977#action_12897977
 ] 

John Sichi commented on HIVE-1293:
--

Namit, I tried testing with a standalone zookeeper via CLI.  Locking a table 
succeeded, but then show locks didn't show anything, and unlock said the lock 
didn't exist.

I think the reason is that CLI is creating a new Driver for each statement 
executed, and when the old Driver is closed, the lock manager is closed along 
with it (closing the ZooKeeper client instance).  As a result, locks are 
released immediately after LOCK TABLE is executed.

When I tested with a thrift server plus two JDBC clients, all was well.  I was 
able to take a lock from one client and prevent the other client from getting 
the same lock.  So I guess the thrift server is keeping one Driver around per 
connection.


> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams

2010-08-12 Thread Mayank Lahiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1518:


Status: Open  (was: Patch Available)

Found the source of the bug we were discussing -- the v1 patch is correct, but 
will submit another patch with the "correct" way to do things.

> context_ngrams() UDAF for estimating top-k contextual n-grams
> -
>
> Key: HIVE-1518
> URL: https://issues.apache.org/jira/browse/HIVE-1518
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1518.1.patch
>
>
> Create a new context_ngrams() function that generalizes the ngrams() UDAF to 
> allow the user to specify context around n-grams. The analogy is 
> "fill-in-the-blanks", and is best illustrated with an example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM 
> twitter;
> will estimate the top-300 words that follow the phrase "i love" in a database 
> of tweets. The position of the null(s) specifies where to generate the n-gram 
> from, and can be placed anywhere. For example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", 
> "hate", null), 300) FROM twitter;
> will estimate the top-300 word-pairs that fill in the blanks specified by 
> null.
> POSSIBLE USES:
> 1. Pre-computing search lookaheads
> 2. Sentiment analysis for products or entities -- e.g., querying with context 
> = array("twitter", "is", null)
> 3. Navigation path analysis in URL databases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1529) Add covariance aggregate function covar_pop and covar_samp

2010-08-12 Thread Pierre Huyn (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Huyn updated HIVE-1529:
--

Status: Patch Available  (was: Open)

This is the initial release of the covariance generic UDAFs, covar_pop and 
covar_samp.

> Add covariance aggregate function covar_pop and covar_samp
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1529.1.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1529) Add covariance aggregate function covar_pop and covar_samp

2010-08-12 Thread Pierre Huyn (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Huyn updated HIVE-1529:
--

Attachment: HIVE-1529.1.patch

This is the first release of 2 covariance generic UDAF: population covariance 
covar_pop(x,y) and sample covariance covar_samp(x,y).

I am requesting a code review.

> Add covariance aggregate function covar_pop and covar_samp
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1529.1.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread Basab Maulik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897953#action_12897953
 ] 

Basab Maulik commented on HIVE-1293:


Re: One lib question: Zookeeper

hbase-handler with hbase 0.20.x does not work with zk 3.3.1 but works fine with 
the version it ships with, zk 3.2.2. Have not investigated what breaks.

> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1512) Need to get hive_hbase-handler to work with hbase versions 0.20.4 0.20.5 and cloudera CDH3 version

2010-08-12 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1512:


Assignee: Basab Maulik  (was: John Sichi)

> Need to get hive_hbase-handler to work with hbase versions 0.20.4  0.20.5 and 
> cloudera CDH3 version
> ---
>
> Key: HIVE-1512
> URL: https://issues.apache.org/jira/browse/HIVE-1512
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: Jimmy Hu
>Assignee: Basab Maulik
> Fix For: 0.7.0
>
> Attachments: HIVE-1512.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> the current trunk  hive_hbase-handler only works with hbase 0.20.3, we need 
> to get it to work with hbase versions 0.20.4  0.20.5 and cloudera CDH3 version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1512) Need to get hive_hbase-handler to work with hbase versions 0.20.4 0.20.5 and cloudera CDH3 version

2010-08-12 Thread Basab Maulik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897948#action_12897948
 ] 

Basab Maulik commented on HIVE-1512:


Quick clarification, the current code works fine against HBase 0.20.5 which I 
tested (and presumably against 0.20.4 as well). Be sure to use the correct 
version of the ZooKeeper libs, 3.2.2.

A patch is needed to get it to build with hbase 0.89.0 snapshots.

> Need to get hive_hbase-handler to work with hbase versions 0.20.4  0.20.5 and 
> cloudera CDH3 version
> ---
>
> Key: HIVE-1512
> URL: https://issues.apache.org/jira/browse/HIVE-1512
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: Jimmy Hu
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1512.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> the current trunk  hive_hbase-handler only works with hbase 0.20.3, we need 
> to get it to work with hbase versions 0.20.4  0.20.5 and cloudera CDH3 version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897946#action_12897946
 ] 

John Sichi commented on HIVE-1293:
--

>From testing:  the parsed lock mode seems to be case-sensitive:

hive> lock table blah shared;
Failed with exception No enum const class 
org.apache.hadoop.hive.ql.lockmgr.HiveLockMode.shared
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

If I use lock table blah SHARED it works.


> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1529) Add covariance aggregate function covar_pop and covar_samp

2010-08-12 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897921#action_12897921
 ] 

John Sichi commented on HIVE-1529:
--

"src" is a test fixture (automatically created for use by all tests).

For an example of how to add a test-specific dataset, see

ql/src/test/queries/clientpositive/nullscript.q

svn add your new file under hive-trunk/data/files.


> Add covariance aggregate function covar_pop and covar_samp
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-12 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897917#action_12897917
 ] 

John Sichi commented on HIVE-1293:
--

I think ZK default client port would be 2181; see HBASE-2305.


> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1533) Use ZooKeeper from maven

2010-08-12 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1533:
-

Fix Version/s: 0.7.0
Affects Version/s: 0.6.0

> Use ZooKeeper from maven
> 
>
> Key: HIVE-1533
> URL: https://issues.apache.org/jira/browse/HIVE-1533
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 0.6.0
>Reporter: Namit Jain
> Fix For: 0.7.0
>
>
> Zookeeper is now available from maven. Maybe we should delete the one in 
> hbase-handler/lib and get it via ivy instead of adding it in the top-level 
> lib? The version we have checked in is 3.2.2, but the maven availability is 
> 3.3.x, so we'd need to test to make sure everything (including hbase-handler) 
> still works with the newer version.
> http://mvnrepository.com/artifact/org.apache.hadoop/zookeeper

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1534) predicate pushdown does not work correctly with outer joins

2010-08-12 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897912#action_12897912
 ] 

John Sichi commented on HIVE-1534:
--

Assigning this to you in case you want to take a look at it together with 
HIVE-741.


> predicate pushdown does not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
>
> The hive documentation for predicate pushdown says:
> Left outer join: predicates on the left side aliases are pushed
> Right outer join: predicates on the right side aliases are pushed
> But, this pushdown should not happen for AND predicates in join queries:
> ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-741) NULL is not handled correctly in join

2010-08-12 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-741:
---

Assignee: Amareshwari Sriramadasu  (was: Ning Zhang)

> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Build failed in Hudson: Hive-trunk-h0.20 #342

2010-08-12 Thread Apache Hudson Server

See 

--
[...truncated 15704 lines...]

init:

install-hadoopcore:

install-hadoopcore-default:

ivy-init-dirs:

ivy-download:
  [get] Getting: 
http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
  [get] To: 

  [get] Not modified - so not downloaded

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:

ivy-retrieve-hadoop-source:
[ivy:retrieve] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ ::
[ivy:retrieve] :: loading settings :: file = 

[ivy:retrieve] :: resolving dependencies :: 
org.apache.hadoop.hive#contrib;work...@minerva.apache.org
[ivy:retrieve]  confs: [default]
[ivy:retrieve]  found hadoop#core;0.20.0 in hadoop-source
[ivy:retrieve] :: resolution report :: resolve 1515ms :: artifacts dl 1ms
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   1   |   0   |   0   |   0   ||   1   |   0   |
-
[ivy:retrieve] :: retrieving :: org.apache.hadoop.hive#contrib
[ivy:retrieve]  confs: [default]
[ivy:retrieve]  0 artifacts copied, 1 already retrieved (0kB/3ms)

install-hadoopcore-internal:

setup:

compile:
 [echo] Compiling: hbase-handler
[javac] 
:271:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds

compile-test:
[javac] 
:304:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 4 source files to 

[javac] Note: 

 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] 
:317:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 2 source files to 


jar:
 [echo] Jar: hbase-handler

test-jar:
  [jar] Building MANIFEST-only jar: 


test:
[junit] Running org.apache.hadoop.hive.cli.TestHBaseCliDriver
[junit] org.apache.hadoop.hbase.client.NoServerForRegionException: Timed 
out trying to locate root region
[junit] at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:976)
[junit] at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625)
[junit] at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:607)
[junit] at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:738)
[junit] at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634)
[junit] at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
[junit] at org.apache.hadoop.hbase.client.HTable.(HTable.java:128)
[junit] at 
org.apache.hadoop.hive.hbase.HBaseTestSetup.setUpFixtures(HBaseTestSetup.java:87)
[junit] at 
org.apache.hadoop.hive.hbase.HBaseTestSetup.preTest(HBaseTestSetup.java:59)
[junit] at 
org.apache.hadoop.hive.hbase.HBaseQTestUtil.(HBaseQTestUtil.java:31)
[junit] at 
org.apache.hadoop.hive.cli.TestHBaseCliDriver.setUp(TestHBaseCliDriver.java:43)
[junit] at junit.framework.TestCase.runBare(TestCase.java:125)
[junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
[junit] at junit.framework.TestResult.runProtected(TestResult.java:124)
[junit] at junit.framework.TestResult.run(TestResult.java:109)
[junit] at junit.framework.TestCase.run(TestCase.java:118)
[junit] at junit.framework.TestSuite

[jira] Commented: (HIVE-1534) predicate pushdown does not work correctly with outer joins

2010-08-12 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897909#action_12897909
 ] 

John Sichi commented on HIVE-1534:
--

Definitely a bug.  It happens regardless of the setting of hive.optimize.ppd, 
so it probably has something to do with the way the join condition is 
decomposed rather than the predicate pushdown optimization.

> predicate pushdown does not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>
> The hive documentation for predicate pushdown says:
> Left outer join: predicates on the left side aliases are pushed
> Right outer join: predicates on the right side aliases are pushed
> But, this pushdown should not happen for AND predicates in join queries:
> ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1534) predicate pushdown does not work correctly with outer joins

2010-08-12 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1534:


Assignee: Amareshwari Sriramadasu

> predicate pushdown does not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
>
> The hive documentation for predicate pushdown says:
> Left outer join: predicates on the left side aliases are pushed
> Right outer join: predicates on the right side aliases are pushed
> But, this pushdown should not happen for AND predicates in join queries:
> ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1460) JOIN should not output rows for NULL values

2010-08-12 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi resolved HIVE-1460.
--

Resolution: Duplicate

Closing as dup.


> JOIN should not output rows for NULL values
> ---
>
> Key: HIVE-1460
> URL: https://issues.apache.org/jira/browse/HIVE-1460
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Zheng Shao
>
> We should filter out rows with NULL keys from the result of this query
> {code}
> SELECT * FROM a JOIN b on a.key = b.key
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1482) Not all jdbc calls are threadsafe.

2010-08-12 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897892#action_12897892
 ] 

John Sichi commented on HIVE-1482:
--

Yeah, the problems may only show up with remoting (which is why I mentioned 
having the test run against a thrift server instead of embedded).


> Not all jdbc calls are threadsafe.
> --
>
> Key: HIVE-1482
> URL: https://issues.apache.org/jira/browse/HIVE-1482
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Drivers
>Affects Versions: 0.7.0
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1482-1.patch
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-12 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi resolved HIVE-1495.
--

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed.  Thanks Yongqiang!


> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch, 
> hive-1495.4.patch, hive-1495.5.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Build failed in Hudson: Hive-trunk-h0.19 #519

2010-08-12 Thread Apache Hudson Server

See 

--
[...truncated 13491 lines...]
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from

Hudson build is back to normal : Hive-trunk-h0.17 #516

2010-08-12 Thread Apache Hudson Server

See

Re: hive version compatibility

2010-08-12 Thread Edward Capriolo

Jaydeep,

Currently one build of hive works with hadoop 0.17,0.18,0.19,and 0.20.
However there is talk about dropping support for older versions and
moving completely to mapreduce api.

Edward

On Thu, Aug 12, 2010 at 8:29 AM, jaydeep vishwakarma
 wrote:
> Hi,
>
> I found very interesting feature in hive version 0.6.0. Is there any
> compatibility constraint with hadoop, If yes than which hadoop version
> it supports?
>
> Regards,
> Jaydeep
>
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify us
> immediately by responding to this email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of the
> information contained in this communication nor for any delay in its
> receipt.
>

Re: [DISCUSSION] Move to become a TLP

2010-08-12 Thread Edward Capriolo

On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo  wrote:
> Folks,
>
> This question has come up in the PMC once again and would be great to hear 
> once more on this topic. What do people think? Are we ready to become a TLP?
>
> Thanks,
> Ashish

I thought of one more benefit. We can rename our packages from

org.apache.hadoop.hive.*
to
org.apache.hive.*

:)

[jira] Commented: (HIVE-1482) Not all jdbc calls are threadsafe.

2010-08-12 Thread Bennie Schut (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897742#action_12897742
 ] 

Bennie Schut commented on HIVE-1482:


I've tried but this is actually surprisingly difficult to reproduce failure in 
a test on TestJdbcDriver. Perhaps there is something synchronized about the use 
of the embedded mode the test is running in?

> Not all jdbc calls are threadsafe.
> --
>
> Key: HIVE-1482
> URL: https://issues.apache.org/jira/browse/HIVE-1482
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Drivers
>Affects Versions: 0.7.0
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1482-1.patch
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

hive version compatibility

2010-08-12 Thread jaydeep vishwakarma


Hi,

I found very interesting feature in hive version 0.6.0. Is there any
compatibility constraint with hadoop, If yes than which hadoop version
it supports?

Regards,
Jaydeep

The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. It may contain confidential or legally privileged information. If 
you are not the intended recipient you are hereby notified that any disclosure, 
copying, distribution or taking any action in reliance on the contents of this 
information is strictly prohibited and may be unlawful. If you have received 
this communication in error, please notify us immediately by responding to this 
email and then delete it from your system. The firm is neither liable for the 
proper and complete transmission of the information contained in this 
communication nor for any delay in its receipt.

Filter Operator applied twice on a where clause?

2010-08-12 Thread Amareshwari Sri Ramadasu

Hi,

I see that if a query has where clause, the FilterOperator is applied twice. 
Can you tell me why is it done so?
It seems second operator is always filtering zero rows.

Explain on a query with where clause :
hive> explain select * from input1 where input1.key != 10;
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF input1)) (TOK_INSERT (TOK_DESTINATION 
(TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (!= 
(. (TOK_TABLE_OR_COL input1) key) 10

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Alias -> Map Operator Tree:
input1
  TableScan
alias: input1
Filter Operator
  predicate:
  expr: (key <> 10)
  type: boolean
  Filter Operator
predicate:
expr: (key <> 10)
type: boolean
Select Operator
  expressions:
expr: key
type: int
expr: value
type: int
  outputColumnNames: _col0, _col1
  File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
Fetch Operator
  limit: -1

I see the same from the Mapper logs also. The first FilterOperator does the 
filtering and second operator always filters zero rows.

2010-08-12 14:33:22,149 INFO ExecMapper:
Id =5
  
Id =0
  
Id =1
  
Id =2
  
Id =3
  
Id =4
  Id = 3 null<\Parent>
<\FS>
  <\Children>
  Id = 2 null<\Parent>
<\SEL>
  <\Children>
  Id = 1 null<\Parent>
<\FIL>
  <\Children>
  Id = 0 null<\Parent>
<\FIL>
  <\Children>
  Id = 5 null<\Parent>
<\TS>
  <\Children>
<\MAP>

2010-08-12 14:33:22,272 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 
forwarding 1 rows
2010-08-12 14:33:22,272 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
0 forwarding 1 rows
2010-08-12 14:33:22,450 INFO ExecMapper: ExecMapper: processing 1 rows: used 
memory = 4417072
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 
forwarded 1 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
DESERIALIZE_ERRORS:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
0 finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
0 forwarded 1 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 
forwarded 0 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 
FILTERED:1
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 
PASSED:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 
forwarded 0 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 
FILTERED:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 
PASSED:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 
forwarded 0 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 
forwarded 0 rows
2010-08-12 14:33:22,451 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Final Path: FS 
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/00_0
2010-08-12 14:33:22,451 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Writing to temp file: FS 
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/_tmp.00_0
2010-08-12 14:33:22,454 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
New Final Path: FS 
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/00_0
2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 
Close done
2010-08-12 14:33:22,485 INFO org.apache.hadoop.hive.ql.ex

[jira] Commented: (HIVE-675) add database/schema support Hive QL

2010-08-12 Thread HBase Review Board (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897625#action_12897625
 ] 

HBase Review Board commented on HIVE-675:
-

Message from: "Carl Steinbach" 


bq.  On 2010-08-11 06:57:04, namit jain wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g, line 572
bq.  > 
bq.  >
bq.  > remove KW_SCHEMAS

In MySQL SHOW SCHEMAS is a synonym for SHOW DATABASES as of version 5.0.2. I 
think the general convention in Hive is to mimic the behavior of MySQL. Please 
let me know if there is a reason why we should not do that here.

http://dev.mysql.com/doc/refman/5.0/en/show-databases.html
 


- Carl


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/508/#review838
---





> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-7-16.patch.txt, 
> HIVE-675-2010-8-4.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Review Request: HIVE-675: Add database/scheme support Hive QL

2010-08-12 Thread Carl Steinbach



> On 2010-08-11 06:57:04, namit jain wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g, line 572
> > 
> >
> > remove KW_SCHEMAS

In MySQL SHOW SCHEMAS is a synonym for SHOW DATABASES as of version 5.0.2. I 
think the general convention in Hive is to mimic the behavior of MySQL. Please 
let me know if there is a reason why we should not do that here.

http://dev.mysql.com/doc/refman/5.0/en/show-databases.html
 


- Carl


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/508/#review838
---


On 2010-08-04 15:34:31, Carl Steinbach wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/508/
> ---
> 
> (Updated 2010-08-04 15:34:31)
> 
> 
> Review request for Hive Developers.
> 
> 
> Summary
> ---
> 
> Database/Scheme support for Hive.
> 
> * Implemented 'SHOW DATABASES' command
> * Refactored TestHiveMetaStore and enabled tests for remote metastore client.
> * Added launch configurations for TestHiveMetaStore and 
> TestHiveMetaStoreRemote
> 
> 
> This addresses bug HIVE-675.
> http://issues.apache.org/jira/browse/HIVE-675
> 
> 
> Diffs
> -
> 
>   build-common.xml d4ff895 
>   eclipse-templates/TestHive.launchtemplate 24efc12 
>   eclipse-templates/TestHiveMetaStore.launchtemplate PRE-CREATION 
>   eclipse-templates/TestHiveMetaStoreRemote.launchtemplate PRE-CREATION 
>   metastore/if/hive_metastore.thrift 478d0af 
>   metastore/src/gen-cpp/ThriftHiveMetastore.h e2538fb 
>   metastore/src/gen-cpp/ThriftHiveMetastore.cpp f945a3a 
>   metastore/src/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp ed2bb99 
>   metastore/src/gen-cpp/hive_metastore_types.h 1b0c706 
>   metastore/src/gen-cpp/hive_metastore_types.cpp b5a403d 
>   
> metastore/src/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java 
> 78c78d9 
>   
> metastore/src/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
>  25408d9 
>   metastore/src/gen-php/ThriftHiveMetastore.php ea4add5 
>   metastore/src/gen-php/hive_metastore_types.php 61872a0 
>   metastore/src/gen-py/hive_metastore/ThriftHiveMetastore-remote fc06cba 
>   metastore/src/gen-py/hive_metastore/ThriftHiveMetastore.py 4a0bc67 
>   metastore/src/gen-py/hive_metastore/ttypes.py ea7269e 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 
> 39dbd52 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> 4fb296a 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
> c6541af 
>   metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
> 6013644 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
> 0818689 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> a06384c 
>   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 4951bd6 
>   metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 4488f94 
>   metastore/src/model/org/apache/hadoop/hive/metastore/model/MDatabase.java 
> b3e098d 
>   metastore/src/model/package.jdo 206ba75 
>   metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 
> fff6aad 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreBase.java
>  PRE-CREATION 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreRemote.java
>  bc950b9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java bc268a4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java d59f48c 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 04dd9e3 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 2ecda01 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
> eedf9e3 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 0484c91 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 02bf926 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 70cd05f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 
> eb079aa 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/CreateDatabaseDesc.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java ed4ed22 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/DropDatabaseDesc.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ShowDatabasesDesc.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SwitchDatabaseDesc.java 
> PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java b4651a2 
>   ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java ab39ca4 
>   
> ql/src/test/org/apache/hadoop/hive/

[jira] Commented: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-12 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897612#action_12897612
 ] 

John Sichi commented on HIVE-1495:
--

Rerunning with latest.


> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch, 
> hive-1495.4.patch, hive-1495.5.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1495) supply correct information to hooks and lineage for index rebuild

2010-08-12 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1495:
---

Attachment: hive-1495.5.patch

Sorry, forgot to update outputs for these two testcases. Will be more careful 
next time.

> supply correct information to hooks and lineage for index rebuild
> -
>
> Key: HIVE-1495
> URL: https://issues.apache.org/jira/browse/HIVE-1495
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1495.1.patch, hive-1495.2.patch, hive-1495.3.patch, 
> hive-1495.4.patch, hive-1495.5.patch
>
>
> This is a followup for HIVE-417.  
> Ashish can probably help on how this should work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: SAXParseException on local mode?

2010-08-12 Thread Joydeep Sen Sarma

Hi Bennie -

- i changed the default for this option to be false in a recent commit.
- what version of java/hadoop are u running? it seems that the job.xml can get 
parsed on ur hadoop nodes, but not the client machine - so there must be some 
difference in the xml parsing library (that's part of the hadoop distro i 
assume)

Joydeep

From: Bennie Schut [bsc...@ebuddy.com]
Sent: Wednesday, August 04, 2010 12:16 AM
To: hive-dev@hadoop.apache.org
Subject: SAXParseException on local mode?

I seem to get this error when hive decides to use local mode. If I
disable it the problem is fixed: "set hive.exec.mode.local.auto=false;"
I was running a large integration test so I'm not exactly sure which
calls to make to reproduce this but perhaps someone else know's what's
going on?

java.lang.RuntimeException: org.xml.sax.SAXParseException: Content is
not allowed in trailing section.
at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1168)
at
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1040)
at
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:382)
at
org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1662)
at org.apache.hadoop.mapred.JobConf.(JobConf.java:215)
10/08/03 15:40:36 INFO parse.ParseDriver: Parsing command: show
functions loglinecleanup
at
org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:93)
at
org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:373)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
10/08/03 15:40:36 INFO parse.ParseDriver: Parse Completed
at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:602)
at
org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:1021)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
10/08/03 15:40:36 INFO ql.Driver: Semantic Analysis Completed
Caused by: org.xml.sax.SAXParseException: Content is not allowed in
trailing section.
at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249)
at
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1092)
... 16 more

RE: How HIVE manages a join

2010-08-12 Thread Joydeep Sen Sarma

i hate this message: 'THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join 
Syntax'

why must edits to the wiki be banned if there are xdocs? hadoop has both.

there will always be things that are not captured in xdocs. it's pretty sad to 
discourage free form edits by people who want to contribute without checking 
out source. (what is this - the 80s?)

From: Edward Capriolo [edlinuxg...@gmail.com]
Sent: Tuesday, August 10, 2010 2:57 PM
To: hive-u...@hadoop.apache.org
Cc: hive-dev@hadoop.apache.org
Subject: Re: How HIVE manages a join

Sorry.
$hive_root/docs/xdocs/language_manual/joins.xml

On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo  wrote:
> This page is is already in version control..
>
> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
>
> Edward
>
> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach  wrote:
>> Hi Yongqiang,
>> Please go ahead and update the wiki page. I will copy it over to version
>> control when you are done.
>> Thanks.
>> Carl
>>
>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he 
>> wrote:
>>>
>>> In the Hive Join wiki page, it says
>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>>
>>> Where should i do the update?
>>>
>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he 
>>> wrote:
>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>>> > and seems stable now. I did one skew join but haven't get a chance to
>>> > look at another skew join Namit mentioned to me. But definitely should
>>> > update the wiki earlier. My bad.
>>> >
>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher 
>>> > wrote:
>>> >> Yongqiang mentioned he was going to update the wiki with this
>>> >> information in
>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>> >>
>>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>>> >> map
>>> >> join and the other skew join you mention in the above thread?
>>> >>
>>> >> Thanks,
>>> >> Jeff
>>> >>
>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>>> >>  wrote:
>>> >>>
>>> >>> Roberto ..
>>> >>>
>>> >>> You can find these links useful ..
>>> >>>
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>> >>> - Simple joins and optimizations..
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  
>>> >>> -
>>> >>> New kind of joins / features of hive ..
>>> >>>
>>> >>> Thanks
>>> >>>
>>> >>> Bharath.V
>>> >>> 4th year Undergraduate..
>>> >>> IIIT Hyderabad
>>> >>>
>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>> >>>  wrote:
>>> 
>>>  Hi,
>>> 
>>>  I cannot find any documentation about what algorithm performs HIVE to
>>>  translate JOIN clauses to Map-Reduce tasks.
>>> 
>>>  In particular, if I have two tables A and B, each table is written on
>>>  a
>>>  separate file and each file is splitted on hadoop nodes. When I
>>>  perform a
>>>  JOIN with A.column = B.column, the framework has to compare full data
>>>  from
>>>  the first file and full data from the second file. In order to
>>>  perform a
>>>  full scan of all possibile combinations of values, how can hadoop
>>>  perform
>>>  it? If each node contains a portion of each file, it seems not
>>>  possible to
>>>  have a complete comparison. Does one of the two files enterely
>>>  replicated on
>>>  each node? Or, does HIVE use another kind of strategy/optimization?
>>> 
>>>  Thanks.
>>> >>
>>> >>
>>> >
>>
>>
>

RE: how jdbm is used in map join

2010-08-12 Thread Joydeep Sen Sarma

i believe each mapper makes a copy since it reads in the data to be loaded into 
the dbm.

this needs to be optimized at some point (ideally we should be putting the dbm 
in distributed cache)

From: Gang Luo [lgpub...@yahoo.com.cn]
Sent: Tuesday, August 10, 2010 3:04 PM
To: hive-dev@hadoop.apache.org
Subject: how jdbm is used in map join

Hi all,
Hive uses JDBM for the replicate table in map join. When multiple map tasks are
running on the same node, will there be multiple copis of JDBM file generated,
or will all the map task share the same copy? If it is the later, which mapper
generates the file, and how to synchronize other mappers?

Thanks,
-Gang

70 matches

Mail list logo