Deserializing map column via JDBC (HIVE-1378)

2010-08-13 Thread Steven Wong
Trying to work on HIVE-1378. My first step is to get the Hive JDBC driver to 
return actual values for mapcol in the result set of "select mapcol, bigintcol, 
stringcol from foo", where mapcol is a map column, instead of 
the current behavior of complaining that mapcol's column type is not recognized.

I changed HiveResultSetMetaData.{getColumnType,getColumnTypeName} to recognize 
the map type, but then the returned value for mapcol is always {}, even though 
mapcol does contain some key-value entries. Turns out this is happening in 
HiveQueryResultSet.next:


1.   The call to client.fetchOne returns the string "{"a":"b","x":"y"}   
123 abc".

2.   The serde (DynamicSerDe ds) deserializes the string to the list 
[{},123,"abc"].

The serde cannot correctly deserialize the map because apparently the map is 
not in the serde's expected serialization format. The serde has been 
initialized with TCTLSeparatedProtocol.

Should we make client.fetchOne return a ctrl-separated string? Or should we use 
a different serde/format in HiveQueryResultSet? It seems the first way is 
right; correct me if that's wrong. And how do we do that?

Thanks.
Steven



[jira] Updated: (HIVE-1428) ALTER TABLE ADD PARTITION fails with a remote Thrift metastore

2010-08-13 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1428:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed to 0.6.


> ALTER TABLE ADD PARTITION fails with a remote Thrift metastore
> --
>
> Key: HIVE-1428
> URL: https://issues.apache.org/jira/browse/HIVE-1428
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Paul Yang
>Assignee: Pradeep Kamath
> Fix For: 0.6.0
>
> Attachments: HIVE-1428-0.6.0-patch.txt, HIVE-1428-2.patch, 
> HIVE-1428-3.patch, HIVE-1428.patch, TestHiveMetaStoreRemote.java
>
>
> If the hive cli is configured to use a remote metastore, ALTER TABLE ... ADD 
> PARTITION commands will fail with an error similar to the following:
> [prade...@chargesize:~/dev/howl]hive --auxpath ult-serde.jar -e "ALTER TABLE 
> mytable add partition(datestamp = '20091101', srcid = '10',action) location 
> '/user/pradeepk/mytable/20091101/10';"
> 10/06/16 17:08:59 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found 
> in the classpath. Usage of hadoop-site.xml is deprecated. Instead use 
> core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of 
> core-default.xml, mapred-default.xml and hdfs-default.xml respectively
> Hive history 
> file=/tmp/pradeepk/hive_job_log_pradeepk_201006161709_1934304805.txt
> FAILED: Error in metadata: org.apache.thrift.TApplicationException: 
> get_partition failed: unknown result
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> [prade...@chargesize:~/dev/howl]
> This is due to a check that tries to retrieve the partition to see if it 
> exists. If it does not, an attempt is made to pass a null value from the 
> metastore. Since thrift does not support null return values, an exception is 
> thrown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1529) Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.

2010-08-13 Thread Pierre Huyn (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Huyn updated HIVE-1529:
--

Status: In Progress  (was: Patch Available)

Updated patch ready for review

> Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1529.1.patch, HIVE-1529.2.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1529) Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.

2010-08-13 Thread Pierre Huyn (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Huyn updated HIVE-1529:
--

  Status: Patch Available  (was: In Progress)
Release Note: New patch available for review.

> Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1529.1.patch, HIVE-1529.2.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Hive Contributors Meeting August 9th @ Facebook

2010-08-13 Thread Carl Steinbach
I found and removed one of these comments on the CLI page. Ed would know if
he added them anywhere else.

Thanks.

Carl

On Fri, Aug 13, 2010 at 2:06 PM, John Sichi  wrote:

> Hi Carl and Ed,
>
> The "conversion to xdocs" warnings on the wiki are causing a bit of
> confusion within Facebook for people trying to be good citizens and update
> the wiki as they find mistakes.
>
> Could someone remove the warnings for now?  Once the final doc conversion
> plan is published, I'll circulate that within Facebook.
>
> Thanks,
> JVS
>
>


[jira] Commented: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams

2010-08-13 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898425#action_12898425
 ] 

HBase Review Board commented on HIVE-1518:
--

Message from: "John Sichi" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/644/#review902
---



http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFContextNGrams.java


Shouldn't name="context_ngrams"?




http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFContextNGrams.java


Here and elsewhere, why do you cast to ArrayList?  Can't you just use the 
List interface?  I don't think Hive makes any guarantee that an ArrayList will 
always be returned, even if that is currently the case.




http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NGramEstimator.java


(I missed this in an earlier review):  can you use generics here to specify 
the HashMap types and avoid so much casting?



http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NGramEstimator.java


specify Comparator to avoid casting


- John





> context_ngrams() UDAF for estimating top-k contextual n-grams
> -
>
> Key: HIVE-1518
> URL: https://issues.apache.org/jira/browse/HIVE-1518
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1518.1.patch, HIVE-1518.2.patch
>
>
> Create a new context_ngrams() function that generalizes the ngrams() UDAF to 
> allow the user to specify context around n-grams. The analogy is 
> "fill-in-the-blanks", and is best illustrated with an example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM 
> twitter;
> will estimate the top-300 words that follow the phrase "i love" in a database 
> of tweets. The position of the null(s) specifies where to generate the n-gram 
> from, and can be placed anywhere. For example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", 
> "hate", null), 300) FROM twitter;
> will estimate the top-300 word-pairs that fill in the blanks specified by 
> null.
> POSSIBLE USES:
> 1. Pre-computing search lookaheads
> 2. Sentiment analysis for products or entities -- e.g., querying with context 
> = array("twitter", "is", null)
> 3. Navigation path analysis in URL databases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams

2010-08-13 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898421#action_12898421
 ] 

HBase Review Board commented on HIVE-1518:
--

Message from: "John Sichi" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/644/
---

Review request for Hive Developers.


Summary
---

review by JVS


This addresses bug HIVE-1518.
http://issues.apache.org/jira/browse/HIVE-1518


Diffs
-

  http://svn.apache.org/repos/asf/hadoop/hive/trunk/data/files/text-en.txt 
985013 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 985013 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFContextNGrams.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFnGrams.java
 985013 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NGramEstimator.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/test/queries/clientpositive/udaf_context_ngrams.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/test/queries/clientpositive/udaf_ngrams.q
 985013 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/test/results/clientpositive/show_functions.q.out
 985013 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/test/results/clientpositive/udaf_context_ngrams.q.out
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/test/results/clientpositive/udaf_ngrams.q.out
 985013 

Diff: http://review.cloudera.org/r/644/diff


Testing
---


Thanks,

John




> context_ngrams() UDAF for estimating top-k contextual n-grams
> -
>
> Key: HIVE-1518
> URL: https://issues.apache.org/jira/browse/HIVE-1518
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1518.1.patch, HIVE-1518.2.patch
>
>
> Create a new context_ngrams() function that generalizes the ngrams() UDAF to 
> allow the user to specify context around n-grams. The analogy is 
> "fill-in-the-blanks", and is best illustrated with an example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM 
> twitter;
> will estimate the top-300 words that follow the phrase "i love" in a database 
> of tweets. The position of the null(s) specifies where to generate the n-gram 
> from, and can be placed anywhere. For example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", 
> "hate", null), 300) FROM twitter;
> will estimate the top-300 word-pairs that fill in the blanks specified by 
> null.
> POSSIBLE USES:
> 1. Pre-computing search lookaheads
> 2. Sentiment analysis for products or entities -- e.g., querying with context 
> = array("twitter", "is", null)
> 3. Navigation path analysis in URL databases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams

2010-08-13 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1518:
-

Status: Open  (was: Patch Available)

Submitted a review here:

https://review.cloudera.org/r/644/

Some of my comments are on existing code which is being moved as part of this 
patch; consider them retroactive since I should have made them on the original 
ngrams patch.

In general, see if you can use generics for collections wherever possible.


> context_ngrams() UDAF for estimating top-k contextual n-grams
> -
>
> Key: HIVE-1518
> URL: https://issues.apache.org/jira/browse/HIVE-1518
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1518.1.patch, HIVE-1518.2.patch
>
>
> Create a new context_ngrams() function that generalizes the ngrams() UDAF to 
> allow the user to specify context around n-grams. The analogy is 
> "fill-in-the-blanks", and is best illustrated with an example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM 
> twitter;
> will estimate the top-300 words that follow the phrase "i love" in a database 
> of tweets. The position of the null(s) specifies where to generate the n-gram 
> from, and can be placed anywhere. For example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", 
> "hate", null), 300) FROM twitter;
> will estimate the top-300 word-pairs that fill in the blanks specified by 
> null.
> POSSIBLE USES:
> 1. Pre-computing search lookaheads
> 2. Sentiment analysis for products or entities -- e.g., querying with context 
> = array("twitter", "is", null)
> 3. Navigation path analysis in URL databases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Hive Contributors Meeting August 9th @ Facebook

2010-08-13 Thread John Sichi
Hi Carl and Ed,

The "conversion to xdocs" warnings on the wiki are causing a bit of confusion 
within Facebook for people trying to be good citizens and update the wiki as 
they find mistakes.

Could someone remove the warnings for now?  Once the final doc conversion plan 
is published, I'll circulate that within Facebook.

Thanks,
JVS



[jira] Updated: (HIVE-1529) Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.

2010-08-13 Thread Pierre Huyn (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Huyn updated HIVE-1529:
--

Attachment: HIVE-1529.2.patch

Implemented all feedback from reviewer.

> Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1529.1.patch, HIVE-1529.2.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [DISCUSSION] Move to become a TLP

2010-08-13 Thread Ashish Thusoo
Nice one Ed...

Folks,

Please chime in. I think we should close this out next week one way or the 
other. We can consider this a vote at this point, so please vote on this issue.

Thanks,
Ashish 

-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Thursday, August 12, 2010 8:05 AM
To: hive-dev@hadoop.apache.org
Subject: Re: [DISCUSSION] Move to become a TLP

On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo  wrote:
> Folks,
>
> This question has come up in the PMC once again and would be great to hear 
> once more on this topic. What do people think? Are we ready to become a TLP?
>
> Thanks,
> Ashish

I thought of one more benefit. We can rename our packages from

org.apache.hadoop.hive.*
to
org.apache.hive.*

:)


[jira] Updated: (HIVE-1531) Make Hive build work with Ivy versions < 2.1.0

2010-08-13 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1531:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed to branch and trunk.  Thanks Carl!


> Make Hive build work with Ivy versions < 2.1.0
> --
>
> Key: HIVE-1531
> URL: https://issues.apache.org/jira/browse/HIVE-1531
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1531.patch.txt
>
>
> Many projects in the Hadoop ecosystem still use Ivy 2.0.0 (including Hadoop 
> and Pig),
> yet Hive requires version 2.1.0. Ordinarily this would not be a problem, but 
> many users
> have a copy of an older version of Ivy in their $ANT_HOME directory, and this 
> copy will
> always get picked up in preference to what the Hive build downloads for 
> itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1428) ALTER TABLE ADD PARTITION fails with a remote Thrift metastore

2010-08-13 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898384#action_12898384
 ] 

John Sichi commented on HIVE-1428:
--

I'll work on the 0.6 commit.


> ALTER TABLE ADD PARTITION fails with a remote Thrift metastore
> --
>
> Key: HIVE-1428
> URL: https://issues.apache.org/jira/browse/HIVE-1428
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Paul Yang
>Assignee: Pradeep Kamath
> Fix For: 0.6.0
>
> Attachments: HIVE-1428-0.6.0-patch.txt, HIVE-1428-2.patch, 
> HIVE-1428-3.patch, HIVE-1428.patch, TestHiveMetaStoreRemote.java
>
>
> If the hive cli is configured to use a remote metastore, ALTER TABLE ... ADD 
> PARTITION commands will fail with an error similar to the following:
> [prade...@chargesize:~/dev/howl]hive --auxpath ult-serde.jar -e "ALTER TABLE 
> mytable add partition(datestamp = '20091101', srcid = '10',action) location 
> '/user/pradeepk/mytable/20091101/10';"
> 10/06/16 17:08:59 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found 
> in the classpath. Usage of hadoop-site.xml is deprecated. Instead use 
> core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of 
> core-default.xml, mapred-default.xml and hdfs-default.xml respectively
> Hive history 
> file=/tmp/pradeepk/hive_job_log_pradeepk_201006161709_1934304805.txt
> FAILED: Error in metadata: org.apache.thrift.TApplicationException: 
> get_partition failed: unknown result
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> [prade...@chargesize:~/dev/howl]
> This is due to a check that tries to retrieve the partition to see if it 
> exists. If it does not, an attempt is made to pass a null value from the 
> metastore. Since thrift does not support null return values, an exception is 
> thrown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1536) Add support for JDBC PreparedStatements

2010-08-13 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898381#action_12898381
 ] 

John Sichi commented on HIVE-1536:
--

Sounds like an environmental problem, since I don't think I've heard reports of 
failures on that test elsewhere.  Can you check 
hive-trunk/build/ql/tmp/hive.log after the test run to find the exception 
details?  (Search for the text of the select statement executed by that test.)


> Add support for JDBC PreparedStatements
> ---
>
> Key: HIVE-1536
> URL: https://issues.apache.org/jira/browse/HIVE-1536
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Sean Flatley
>
> As a result of a Sprint which had us using Pentaho Data Integration with the 
> Hive database we have updated the driver.  Many PreparedStatement methods 
> have been implemented.  A patch will be attached tomorrow with a summary of 
> changes.
> Note:  A checkout of Hive/trunk was performed and the TestJdbcDriver test 
> cased was run.  This was done before any modifications were made to the 
> checked out project.  The testResultSetMetaData failed:
> java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: 
> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
>   at 
> org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
>   at 
> org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData(TestJdbcDriver.java:530)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at junit.framework.TestCase.runTest(TestCase.java:154)
>   at junit.framework.TestCase.runBare(TestCase.java:127)
>   at junit.framework.TestResult$1.protect(TestResult.java:106)
>   at junit.framework.TestResult.runProtected(TestResult.java:124)
>   at junit.framework.TestResult.run(TestResult.java:109)
>   at junit.framework.TestCase.run(TestCase.java:118)
>   at junit.framework.TestSuite.runTest(TestSuite.java:208)
>   at junit.framework.TestSuite.run(TestSuite.java:203)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
> A co-worker did the same and the tests passed.  Both environments were Ubuntu 
> and Hadoop version 0.20.2.
> Tests added to the TestJdbcDriver by us were successful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1538) FilterOperator is applied twice with ppd on.

2010-08-13 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1538:


Assignee: Amareshwari Sriramadasu

> FilterOperator is applied twice with ppd on.
> 
>
> Key: HIVE-1538
> URL: https://issues.apache.org/jira/browse/HIVE-1538
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
>
> With hive.optimize.ppd set to true, FilterOperator is applied twice. And it 
> seems second operator is always filtering zero rows.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-741) NULL is not handled correctly in join

2010-08-13 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898374#action_12898374
 ] 

John Sichi commented on HIVE-741:
-

@Ted:  as Amareshwari mentioned, a left outer join preserves rows on the left 
side regardless of whether the ON clause evaluates true.  So in that case (and 
similar cases for right/full outer join), we can't filter out the rows with 
null join keys.


> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Question about hive locking

2010-08-13 Thread John Sichi
On Aug 13, 2010, at 10:19 AM, Pradeep Kamath wrote:

> Hi,
>  Reading through http://wiki.apache.org/hadoop/Hive/Locking, it appears that 
> locking will be implemented using Zookeeper in the Query language - just 
> wanted to confirm that the metastore APIs are not being modified and that 
> there is no information about existing locks in the metastore - is this 
> correct? If so, won't a thrift API call circumvent existing locks and gain 
> read/write access? Did I miss something?


That's correct.  The locking is all from the Hive client side.  We want the 
locks to be released automatically if a client dies, and doing it from within 
the metastore thrift server wouldn't give us that.

You can also circumvent the locks by going directly to HDFS through any 
non-Hive means...we can't stop that.

JVS



[jira] Commented: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-13 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898369#action_12898369
 ] 

John Sichi commented on HIVE-1514:
--

Yongqiang, for reference doc updates, remember to add a phrase like "(Note:  
only available starting with 0.7.0)" so that users of earlier Hive versions 
know they need to upgrade if they want the feature.

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch, hive-1514.2.patch, hive-1514.3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1529) Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.

2010-08-13 Thread Mayank Lahiri (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898364#action_12898364
 ] 

Mayank Lahiri commented on HIVE-1529:
-

Happy to help! It gets a lot easier after the first couple of UD(A)Fs...

For the code conventions, Hive uses the Sun Java code conventions: 
http://www.oracle.com/technetwork/java/codeconvtoc-136057.html (the example 
usage section is probably the most helpful, and I believe not all of them are 
checked by checkstyle.)

> Add ANSI SQL covariance aggregate functions: covar_pop and covar_samp.
> --
>
> Key: HIVE-1529
> URL: https://issues.apache.org/jira/browse/HIVE-1529
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1529.1.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Create new built-in aggregate functions covar_pop and covar_samp, functions 
> commonly used in statistical data analyses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-13 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898350#action_12898350
 ] 

He Yongqiang commented on HIVE-1514:


I updated the wiki page here :
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Alter_Table.2BAC8-Partition_Location

This only change the metadata. With this patch, you will be able to let the 
partition point to some external places, and use a new fileformat. If the 
metadata you specified is correct, you will be able to do that.

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch, hive-1514.2.patch, hive-1514.3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1514) Be able to modify a partition's fileformat and file location information.

2010-08-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898337#action_12898337
 ] 

Ashutosh Chauhan commented on HIVE-1514:


>From jira description and discussions, its not clear to me what changes went 
>in here.
It will be useful to summarize the use case which this jira satisfies. From 
cursory look of the patch, it seems following is now possible to do
{code}
ALTER TABLE table_name [partitionSpec] SET LOCATION "new location" set 
fileformat rcfile
{code}
 or some such. If so, is the use case the following: user created some data for 
a existing hive table externally (meaning through some process outside of hive) 
and now wants to use it to query from hive. So, she needs to do metadata 
operation as above (which is now enabled through this patch) ?

> Be able to modify a partition's fileformat and file location information.
> -
>
> Key: HIVE-1514
> URL: https://issues.apache.org/jira/browse/HIVE-1514
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1514.1.patch, hive-1514.2.patch, hive-1514.3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Question about hive locking

2010-08-13 Thread Pradeep Kamath
Hi,
  Reading through http://wiki.apache.org/hadoop/Hive/Locking, it appears that 
locking will be implemented using Zookeeper in the Query language - just wanted 
to confirm that the metastore APIs are not being modified and that there is no 
information about existing locks in the metastore - is this correct? If so, 
won't a thrift API call circumvent existing locks and gain read/write access? 
Did I miss something?

Thanks,
Pradeep


[jira] Updated: (HIVE-1539) Concurrent metastore threading problem

2010-08-13 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-1539:
---

Attachment: thread_dump_hanging.txt

Thread dump.

> Concurrent metastore threading problem 
> ---
>
> Key: HIVE-1539
> URL: https://issues.apache.org/jira/browse/HIVE-1539
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.7.0
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Attachments: thread_dump_hanging.txt
>
>
> When running hive as a service and running a high number of queries 
> concurrently I end up with multiple threads running at 100% cpu without any 
> progress.
> Looking at these threads I notice this thread(484e):
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:598)
> But on a different thread(63a2):
> at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoReplaceField(MStorageDescriptor.java)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1539) Concurrent metastore threading problem

2010-08-13 Thread Bennie Schut (JIRA)
Concurrent metastore threading problem 
---

 Key: HIVE-1539
 URL: https://issues.apache.org/jira/browse/HIVE-1539
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Bennie Schut
Assignee: Bennie Schut


When running hive as a service and running a high number of queries 
concurrently I end up with multiple threads running at 100% cpu without any 
progress.

Looking at these threads I notice this thread(484e):
at org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:598)

But on a different thread(63a2):
at 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoReplaceField(MStorageDescriptor.java)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1535) alter partition should throw exception if the specified partition does not exist.

2010-08-13 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1535:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Yongqiang

> alter partition should throw exception if the specified partition does not 
> exist.
> -
>
> Key: HIVE-1535
> URL: https://issues.apache.org/jira/browse/HIVE-1535
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1535.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-13 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898266#action_12898266
 ] 

Edward Capriolo commented on HIVE-1530:
---

@Joydeep +1 

> Include hive-default.xml and hive-log4j.properties in hive-common JAR
> -
>
> Key: HIVE-1530
> URL: https://issues.apache.org/jira/browse/HIVE-1530
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Carl Steinbach
>
> hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
> and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
> hive-default.xml file that currently sits in the conf/ directory should be 
> removed.
> Motivations for this change:
> * We explicitly tell users that they should never modify hive-default.xml yet 
> give them the opportunity to do so by placing the file in the conf dir.
> * Many users are familiar with the Hadoop configuration mechanism that does 
> not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
> assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-741) NULL is not handled correctly in join

2010-08-13 Thread Ted Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898194#action_12898194
 ] 

Ted Xu commented on HIVE-741:
-

Sorry I'm mistakenly expressed my idea, I mean values with NULL join keys shall 
be filtered out in mappers. 

I think values with NULL join keys shall be filtered out because NULL equals 
nothing, and Hive only support equal join.

Please correct me if I'm wrong. 

> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1538) FilterOperator is applied twice with ppd on.

2010-08-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898144#action_12898144
 ] 

Amareshwari Sriramadasu commented on HIVE-1538:
---

Also, I observed that Select Operator is applied twice for a MapJoin query. Is 
it related to this?

> FilterOperator is applied twice with ppd on.
> 
>
> Key: HIVE-1538
> URL: https://issues.apache.org/jira/browse/HIVE-1538
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>
> With hive.optimize.ppd set to true, FilterOperator is applied twice. And it 
> seems second operator is always filtering zero rows.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-741) NULL is not handled correctly in join

2010-08-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898141#action_12898141
 ] 

Amareshwari Sriramadasu commented on HIVE-741:
--

bq. but it will be great if we can filter the NULL values in mappers.
Ted, we should not filter out the NULL values in mappers. Because for outer 
joins, these rows should be cartesian producted with nulls as shown in expected 
output in the [comment | 
https://issues.apache.org/jira/browse/HIVE-741?focusedCommentId=12896789&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12896789]

> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-741) NULL is not handled correctly in join

2010-08-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898136#action_12898136
 ] 

Amareshwari Sriramadasu commented on HIVE-741:
--

Thanks Ning for the details.

To summarize the implementation of join: 
* In reduce-side join, rows with same join keys are grouped together; and  in 
MapSide join, rows with same join keys are added the same entry in the hash 
table. 
* CommonJoinOperator.checkAndGenObject: The rows with same join key are 
cartesian producted with each other(i.e. with rows of different aliases). If 
there are no rows in one table alias, the rows of other table alias are ignored 
(for inner join) or cartesian producted with nulls (outer joins). 

The above implementation works fine except for null join keys ; Since these 
rows are grouped together/hashed to same entry, the current issue exists.
 
bq. I think the fix would be to check the NULL value of the join keys and do 
proper output based on the semantics of different types of joins.
This would need special handling for each type of join (inner, left outer, 
right outer, full outer  an etc.). So, I'm thinking the better solution is not 
group rows with null join keys together. Then the above join algorithm works 
correctly for all types of joins.

Currently they are grouped together because HiveKey.compare compares the bytes 
of the key (in case of reduce-side join) and MapJoinObjectKey.equals returns 
true if both keys are null (in case of map-side join). I'm trying to see if can 
come up with a solution which does not group rows with null join keys together. 
Please correct me if am wrong.

> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-741) NULL is not handled correctly in join

2010-08-13 Thread Ted Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898132#action_12898132
 ] 

Ted Xu commented on HIVE-741:
-

Thanks for Ning and Amareshwari, we are looking forward to see the bug fixed. I 
think it's Okay to solve it by modifying the *JoinOperators, but it will be 
great if we can filter the NULL values in mappers, say, in ReduceSinkOperator, 
provided if we can know which part of the reduce sink key is from join (other 
than from group by, distinct, etc,.).  

> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) predicate pushdown does not work correctly with outer joins

2010-08-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898124#action_12898124
 ] 

Amareshwari Sriramadasu commented on HIVE-1534:
---

With ppd on or off, Mapper is filtering out table with alias a on the predicate 
a.key < 100 for the left outer join query. Similarly on alias b for right outer 
join query. This is mostly because of HIVE-1538.

> predicate pushdown does not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
>
> The hive documentation for predicate pushdown says:
> Left outer join: predicates on the left side aliases are pushed
> Right outer join: predicates on the right side aliases are pushed
> But, this pushdown should not happen for AND predicates in join queries:
> ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Filter Operator applied twice on a where clause?

2010-08-13 Thread Amareshwari Sri Ramadasu
Observed that FilterOperator is applied only once if hive.optimize.ppd is set 
false. I think there is bug with predicate pushdown. So, raised HIVE-1538.

Thanks
Amareshwari

On 8/12/10 3:01 PM, "Amareshwari Sri Ramadasu"  wrote:

Hi,

I see that if a query has where clause, the FilterOperator is applied twice. 
Can you tell me why is it done so?
It seems second operator is always filtering zero rows.

Explain on a query with where clause :
hive> explain select * from input1 where input1.key != 10;
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF input1)) (TOK_INSERT (TOK_DESTINATION 
(TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (!= 
(. (TOK_TABLE_OR_COL input1) key) 10

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Alias -> Map Operator Tree:
input1
  TableScan
alias: input1
Filter Operator
  predicate:
  expr: (key <> 10)
  type: boolean
  Filter Operator
predicate:
expr: (key <> 10)
type: boolean
Select Operator
  expressions:
expr: key
type: int
expr: value
type: int
  outputColumnNames: _col0, _col1
  File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
Fetch Operator
  limit: -1

I see the same from the Mapper logs also. The first FilterOperator does the 
filtering and second operator always filters zero rows.

2010-08-12 14:33:22,149 INFO ExecMapper:
Id =5
  
Id =0
  
Id =1
  
Id =2
  
Id =3
  
Id =4
  Id = 3 null<\Parent>
<\FS>
  <\Children>
  Id = 2 null<\Parent>
<\SEL>
  <\Children>
  Id = 1 null<\Parent>
<\FIL>
  <\Children>
  Id = 0 null<\Parent>
<\FIL>
  <\Children>
  Id = 5 null<\Parent>
<\TS>
  <\Children>
<\MAP>

2010-08-12 14:33:22,272 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 
forwarding 1 rows
2010-08-12 14:33:22,272 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
0 forwarding 1 rows
2010-08-12 14:33:22,450 INFO ExecMapper: ExecMapper: processing 1 rows: used 
memory = 4417072
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 
forwarded 1 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
DESERIALIZE_ERRORS:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
0 finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
0 forwarded 1 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 
forwarded 0 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 
FILTERED:1
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 
PASSED:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 
forwarded 0 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 
FILTERED:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 
PASSED:0
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 
forwarded 0 rows
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 
finished. closing...
2010-08-12 14:33:22,450 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 
forwarded 0 rows
2010-08-12 14:33:22,451 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Final Path: FS 
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/00_0
2010-08-12 14:33:22,451 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Writing to temp file: FS 
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-12_14-33-14_470_1825337114959896683/_tmp.-ext-10001/_tmp.00_0
2010-08-12 14:33:22,454 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
New Final Path: FS 
hdfs://localhost:19000/tmp/hiv

[jira] Commented: (HIVE-1538) FilterOperator is applied twice with ppd on.

2010-08-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898122#action_12898122
 ] 

Amareshwari Sriramadasu commented on HIVE-1538:
---

With hive.optimize.ppd set to false, I see that the FilterOperator is applied 
only once.
{noformat}
hive> SET hive.optimize.ppd=false;
hive> explain select * from input1 where input1.key != 10;
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF input1)) (TOK_INSERT (TOK_DESTINATION 
(TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (!= 
(. (TOK_TABLE_OR_COL input1) key) 10

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Alias -> Map Operator Tree:
input1
  TableScan
alias: input1
Filter Operator
  predicate:
  expr: (key <> 10)
  type: boolean
  Select Operator
expressions:
  expr: key
  type: int
  expr: value
  type: int
outputColumnNames: _col0, _col1
File Output Operator
  compressed: false
  GlobalTableId: 0
  table:
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
Fetch Operator
  limit: -1

Time taken: 0.022 seconds
{noformat}

> FilterOperator is applied twice with ppd on.
> 
>
> Key: HIVE-1538
> URL: https://issues.apache.org/jira/browse/HIVE-1538
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>
> With hive.optimize.ppd set to true, FilterOperator is applied twice. And it 
> seems second operator is always filtering zero rows.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1538) FilterOperator is applied twice with ppd on.

2010-08-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898120#action_12898120
 ] 

Amareshwari Sriramadasu commented on HIVE-1538:
---

I see that if a query has where clause, the FilterOperator is applied twice.

Explain on a query with where clause :
hive> explain select * from input1 where input1.key != 10;
{noformat}
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF input1)) (TOK_INSERT (TOK_DESTINATION 
(TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (!= 
(. (TOK_TABLE_OR_COL input1) key) 10

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Alias -> Map Operator Tree:
input1
  TableScan
alias: input1
Filter Operator
  predicate:
  expr: (key <> 10)
  type: boolean
  Filter Operator
predicate:
expr: (key <> 10)
type: boolean
Select Operator
  expressions:
expr: key
type: int
expr: value
type: int
  outputColumnNames: _col0, _col1
  File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
Fetch Operator
  limit: -1
Time taken: 0.099 seconds
{noformat}

I see the same from the Mapper logs also. The first FilterOperator does the
filtering and second operator always filters zero rows.

{noformat}

2010-08-13 13:20:21,451 INFO ExecMapper: 
Id =5
  
Id =0
  
Id =1
  
Id =2
  
Id =3
  
Id =4
  Id = 3 null<\Parent>
<\FS>
  <\Children>
  Id = 2 null<\Parent>
<\SEL>
  <\Children>
  Id = 1 null<\Parent>
<\FIL>
  <\Children>
  Id = 0 null<\Parent>
<\FIL>
  <\Children>
  Id = 5 null<\Parent>
<\TS>
  <\Children>
<\MAP>
...
2010-08-13 13:20:21,489 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 
forwarding 1 rows
2010-08-13 13:20:21,489 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
0 forwarding 1 rows
2010-08-13 13:20:21,600 INFO ExecMapper: ExecMapper: processing 1 rows: used 
memory = 10765360
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 
finished. closing... 
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 
forwarded 1 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
DESERIALIZE_ERRORS:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
0 finished. closing... 
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
0 forwarded 1 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 
finished. closing... 
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 
forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 
PASSED:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 
FILTERED:1
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 
finished. closing... 
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 
forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 
PASSED:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 
FILTERED:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 
finished. closing... 
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 
forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 
finished. closing... 
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 
forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Final Path: FS 
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-13_13-20-11_483_2065579562420016208/_tmp.-ext-10001/00_0
2010-08-13 13:20:21,601 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Writing to temp file: FS 
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-13_13-20-11_483_2065579562420016208/_tmp.-ext-10001/_tmp.00_0
2010-08-13 13:20:21,604 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
New Final Path: FS 
hdfs://localho

[jira] Created: (HIVE-1538) FilterOperator is applied twice with ppd on.

2010-08-13 Thread Amareshwari Sriramadasu (JIRA)
FilterOperator is applied twice with ppd on.


 Key: HIVE-1538
 URL: https://issues.apache.org/jira/browse/HIVE-1538
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu


With hive.optimize.ppd set to true, FilterOperator is applied twice. And it 
seems second operator is always filtering zero rows.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-13 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898118#action_12898118
 ] 

Joydeep Sen Sarma commented on HIVE-1530:
-

don't dis-allow hive.* options not specified in HiveConf. reason is that hive 
is extensible at various points via custom code and those have access to config 
object and installs may want to set variables specific to their plugins etc. 
(we shouldn't be in the business of telling them what not to name them)

> Include hive-default.xml and hive-log4j.properties in hive-common JAR
> -
>
> Key: HIVE-1530
> URL: https://issues.apache.org/jira/browse/HIVE-1530
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Carl Steinbach
>
> hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
> and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
> hive-default.xml file that currently sits in the conf/ directory should be 
> removed.
> Motivations for this change:
> * We explicitly tell users that they should never modify hive-default.xml yet 
> give them the opportunity to do so by placing the file in the conf dir.
> * Many users are familiar with the Hadoop configuration mechanism that does 
> not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
> assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.