[jira] [Updated] (HIVE-3276) optimize union sub-queries

2012-10-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3276:
-

Attachment: hive.3276.10.patch

 optimize union sub-queries
 --

 Key: HIVE-3276
 URL: https://issues.apache.org/jira/browse/HIVE-3276
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3276.10.patch, HIVE-3276.1.patch, 
 hive.3276.2.patch, hive.3276.3.patch, hive.3276.4.patch, hive.3276.5.patch, 
 hive.3276.6.patch, hive.3276.7.patch, hive.3276.8.patch, hive.3276.9.patch


 It might be a good idea to optimize simple union queries containing 
 map-reduce jobs in at least one of the sub-qeuries.
 For eg:
 a query like:
 insert overwrite table T1 partition P1
 select * from 
 (
   subq1
 union all
   subq2
 ) u;
 today creates 3 map-reduce jobs, one for subq1, another for subq2 and 
 the final one for the union. 
 It might be a good idea to optimize this. Instead of creating the union 
 task, it might be simpler to create a move task (or something like a move
 task), where the outputs of the two sub-queries will be moved to the final 
 directory. This can easily extend to more than 2 sub-queries in the union.
 This is very useful if there is a select * followed by filesink after the
 union. This can be independently useful, and also be used to optimize the
 skewed joins https://cwiki.apache.org/Hive/skewed-join-optimization.html.
 If there is a select, filter between the union and the filesink, the select
 and the filter can be moved before the union, and the follow-up job can
 still be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3276) optimize union sub-queries

2012-10-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3276:
-

Status: Patch Available  (was: Open)

comments addressed

 optimize union sub-queries
 --

 Key: HIVE-3276
 URL: https://issues.apache.org/jira/browse/HIVE-3276
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3276.10.patch, HIVE-3276.1.patch, 
 hive.3276.2.patch, hive.3276.3.patch, hive.3276.4.patch, hive.3276.5.patch, 
 hive.3276.6.patch, hive.3276.7.patch, hive.3276.8.patch, hive.3276.9.patch


 It might be a good idea to optimize simple union queries containing 
 map-reduce jobs in at least one of the sub-qeuries.
 For eg:
 a query like:
 insert overwrite table T1 partition P1
 select * from 
 (
   subq1
 union all
   subq2
 ) u;
 today creates 3 map-reduce jobs, one for subq1, another for subq2 and 
 the final one for the union. 
 It might be a good idea to optimize this. Instead of creating the union 
 task, it might be simpler to create a move task (or something like a move
 task), where the outputs of the two sub-queries will be moved to the final 
 directory. This can easily extend to more than 2 sub-queries in the union.
 This is very useful if there is a select * followed by filesink after the
 union. This can be independently useful, and also be used to optimize the
 skewed joins https://cwiki.apache.org/Hive/skewed-join-optimization.html.
 If there is a select, filter between the union and the filesink, the select
 and the filter can be moved before the union, and the follow-up job can
 still be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive

2012-10-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3433:
-

Status: Patch Available  (was: Open)

addressed comments

 Implement CUBE and ROLLUP operators in Hive
 ---

 Key: HIVE-3433
 URL: https://issues.apache.org/jira/browse/HIVE-3433
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Sambavi Muthukrishnan
Assignee: Namit Jain
 Attachments: hive.3433.1.patch, hive.3433.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive

2012-10-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3433:
-

Attachment: hive.3433.2.patch

 Implement CUBE and ROLLUP operators in Hive
 ---

 Key: HIVE-3433
 URL: https://issues.apache.org/jira/browse/HIVE-3433
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Sambavi Muthukrishnan
Assignee: Namit Jain
 Attachments: hive.3433.1.patch, hive.3433.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468475#comment-13468475
 ] 

Namit Jain commented on HIVE-3514:
--

comments

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive

2012-10-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468492#comment-13468492
 ] 

Namit Jain commented on HIVE-3433:
--

Shreepadma, I saw your ivy.xml changes in 
https://reviews.apache.org/r/6878/diff/?page=1.
I can do the conversion of bitset to fastbitset once your jira is in.



 Implement CUBE and ROLLUP operators in Hive
 ---

 Key: HIVE-3433
 URL: https://issues.apache.org/jira/browse/HIVE-3433
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Sambavi Muthukrishnan
Assignee: Namit Jain
 Attachments: hive.3433.1.patch, hive.3433.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-1362: Support for column statistics in Hive

2012-10-03 Thread namit jain

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6878/#review12131
---


Some questions:

How much this interact with hive.stats.reliable ?

There are many places with a TODO
The formatting needs to be fixed in many places.
Some functions are returning hashset etc. - they should be changed to return a 
set instead ?
Can you make sure you use complete variable names - 


metastore/if/hive_metastore.thrift
https://reviews.apache.org/r/6878/#comment25818

Does it make sense to add a thrift API for updating statistics ? There 
doesn't exist a interface for updating
row level statistics.



ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java
https://reviews.apache.org/r/6878/#comment25819

can you use full variable name instead of Rwt



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
https://reviews.apache.org/r/6878/#comment25820

LHS should not be an arraylist

Please fix all such occurences


- namit jain


On Oct. 3, 2012, 3:10 a.m., Shreepadma Venugopalan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/6878/
 ---
 
 (Updated Oct. 3, 2012, 3:10 a.m.)
 
 
 Review request for hive and Carl Steinbach.
 
 
 Description
 ---
 
 This patch implements version 1 of the column statistics project in Hive. It 
 adds support for computing and persisting statistical summary of column 
 values in Hive Tables and Partitions. In order to support column statistics 
 in Hive, this patch does the following,
 
 * Adds a new compute stats UDAF to compute scalar statistics for all 
 primitive Hive data types. In version 1 of the project, we support the 
 following scalar statistics on primitive types - estimate of number of 
 distinct values, number of null values, number of trues/falses for boolean 
 typed columsn, max and avg length for string and binary typed columns, max 
 and min value for long and double typed columns. Note that version 1 of the 
 column stats project includes support for column statistics both at the table 
 and partition level.
 
 * Adds Metastore schema tables to persist the newly added statistics both at 
 table and partition level.
 * Adds Metastore Thrift API to persist, retrieve and delete column statistics 
 at both table and partition level. 
 Please refer to the following wiki link for the details of the schema and the 
 Thrift API changes - 
 https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive
 
 * Extends the analyze table compute statistics statement to trigger 
 statistics computation and persistence for one or more columns. Please note 
 that statistics for multiple columns is computed through a single scan of the 
 table data. Please refer to the following wiki link for the syntax changes - 
 https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive
 
 One thing missing from the patch at this point is the metastore upgrade 
 scrips for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to 
 finalize the metastore schema changes before I go ahead and add the upgrade 
 scripts.
 
 In a follow on patch, as part of version 2 of the column statistics project, 
 we will add support for computing, persisting and retrieving histograms on 
 long and double typed column values.
 
 Generated Thrift files have been removed for viewing pleasure. JIRA page has 
 the patch with the generated Thrift files.
 
 
 This addresses bug HIVE-1362.
 https://issues.apache.org/jira/browse/HIVE-1362
 
 
 Diffs
 -
 
   data/files/UserVisits.dat PRE-CREATION 
   data/files/binary.txt PRE-CREATION 
   data/files/bool.txt PRE-CREATION 
   data/files/double.txt PRE-CREATION 
   data/files/employee.dat PRE-CREATION 
   data/files/employee2.dat PRE-CREATION 
   data/files/int.txt PRE-CREATION 
   ivy/libraries.properties 7ac6778 
   metastore/if/hive_metastore.thrift d4fad72 
   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 8fec13d 
   
 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
 17b986c 
   metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
 3883b5b 
   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 eff44b1 
   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bf5ae3a 
   metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 77d1caa 
   
 metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
  PRE-CREATION 
   
 metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java
  PRE-CREATION 
   metastore/src/model/package.jdo 38ce6d5 
   
 

[jira] [Commented] (HIVE-1362) column level statistics

2012-10-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468502#comment-13468502
 ] 

Namit Jain commented on HIVE-1362:
--

Are the stats collected while the table is being scanned, or is it part of 
analyze only ?

 column level statistics
 ---

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
 HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1362) column level statistics

2012-10-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1362:
-

Status: Open  (was: Patch Available)

 column level statistics
 ---

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
 HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Insert into vs Insert overwrite

2012-10-03 Thread Kasun Weranga
Hi all,

I would like to know the difference between Hive insert into and insert
overwrite for a Hive external table.

Thanks,
Kasun.


Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #157

2012-10-03 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/157/

--
[...truncated 5071 lines...]
A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql
A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan
A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api
A 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/QueryPlan.java
A 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Adjacency.java
A 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Graph.java
A 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Task.java
A 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/AdjacencyType.java
A 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Stage.java
A 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/TaskType.java
A 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Query.java
A 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/StageType.java
A 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/NodeType.java
A 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Operator.java
A 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
A ql/src/gen/thrift/gen-php
A ql/src/gen/thrift/gen-php/queryplan
A ql/src/gen/thrift/gen-php/queryplan/queryplan_types.php
A ql/src/gen-javabean
A ql/src/gen-javabean/org
A ql/src/gen-javabean/org/apache
A ql/src/gen-javabean/org/apache/hadoop
A ql/src/gen-javabean/org/apache/hadoop/hive
A ql/src/gen-javabean/org/apache/hadoop/hive/ql
A ql/src/gen-javabean/org/apache/hadoop/hive/ql/plan
A ql/src/gen-javabean/org/apache/hadoop/hive/ql/plan/api
A ql/src/gen-php
A ql/build.xml
A ql/if
A ql/if/queryplan.thrift
A pdk
A pdk/ivy.xml
A pdk/scripts
A pdk/scripts/class-registration.xsl
A pdk/scripts/build-plugin.xml
A pdk/scripts/README
A pdk/src
A pdk/src/java
A pdk/src/java/org
A pdk/src/java/org/apache
A pdk/src/java/org/apache/hive
A pdk/src/java/org/apache/hive/pdk
A pdk/src/java/org/apache/hive/pdk/FunctionExtractor.java
A pdk/src/java/org/apache/hive/pdk/HivePdkUnitTest.java
A pdk/src/java/org/apache/hive/pdk/HivePdkUnitTests.java
A pdk/src/java/org/apache/hive/pdk/PluginTest.java
A pdk/test-plugin
A pdk/test-plugin/test
A pdk/test-plugin/test/cleanup.sql
A pdk/test-plugin/test/onerow.txt
A pdk/test-plugin/test/setup.sql
A pdk/test-plugin/src
A pdk/test-plugin/src/org
A pdk/test-plugin/src/org/apache
A pdk/test-plugin/src/org/apache/hive
A pdk/test-plugin/src/org/apache/hive/pdktest
A pdk/test-plugin/src/org/apache/hive/pdktest/Rot13.java
A pdk/test-plugin/build.xml
A pdk/build.xml
A build-offline.xml
 U.
At revision 1393573
no change for http://svn.apache.org/repos/asf/hive/branches/branch-0.9 since 
the previous build
[hive] $ /home/hudson/tools/ant/apache-ant-1.8.1/bin/ant 
-Dversion=0.9.1-SNAPSHOT very-clean tar binary
Buildfile: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build.xml

ivy-init-dirs:
 [echo] Project: hive
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/lib
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/report
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/maven

ivy-download:
 [echo] Project: hive
  [get] Getting: 
http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
  [get] To: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/lib/ivy-2.1.0.jar

ivy-probe-antlib:
 [echo] Project: hive

ivy-init-antlib:
 [echo] Project: hive

ivy-clean-cache:
[ivy:cleancache] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ ::
[ivy:cleancache] :: loading settings :: url = 
jar:file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/lib/ivy-2.1.0.jar!/org/apache/ivy/core/settings/ivysettings.xml

clean:
 [echo] Project: hive

clean:
 [echo] Project: anttasks

clean:
 [echo] Project: shims

clean:
 [echo] Project: common

clean:
 [echo] Project: serde

clean:
 [echo] Project: metastore


Re: Review Request: HIVE-1362: Support for column statistics in Hive

2012-10-03 Thread Shreepadma Venugopalan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6878/#review12133
---



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
https://reviews.apache.org/r/6878/#comment25822

I'll replace LHS with generic java types.


- Shreepadma Venugopalan


On Oct. 3, 2012, 3:10 a.m., Shreepadma Venugopalan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/6878/
 ---
 
 (Updated Oct. 3, 2012, 3:10 a.m.)
 
 
 Review request for hive and Carl Steinbach.
 
 
 Description
 ---
 
 This patch implements version 1 of the column statistics project in Hive. It 
 adds support for computing and persisting statistical summary of column 
 values in Hive Tables and Partitions. In order to support column statistics 
 in Hive, this patch does the following,
 
 * Adds a new compute stats UDAF to compute scalar statistics for all 
 primitive Hive data types. In version 1 of the project, we support the 
 following scalar statistics on primitive types - estimate of number of 
 distinct values, number of null values, number of trues/falses for boolean 
 typed columsn, max and avg length for string and binary typed columns, max 
 and min value for long and double typed columns. Note that version 1 of the 
 column stats project includes support for column statistics both at the table 
 and partition level.
 
 * Adds Metastore schema tables to persist the newly added statistics both at 
 table and partition level.
 * Adds Metastore Thrift API to persist, retrieve and delete column statistics 
 at both table and partition level. 
 Please refer to the following wiki link for the details of the schema and the 
 Thrift API changes - 
 https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive
 
 * Extends the analyze table compute statistics statement to trigger 
 statistics computation and persistence for one or more columns. Please note 
 that statistics for multiple columns is computed through a single scan of the 
 table data. Please refer to the following wiki link for the syntax changes - 
 https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive
 
 One thing missing from the patch at this point is the metastore upgrade 
 scrips for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to 
 finalize the metastore schema changes before I go ahead and add the upgrade 
 scripts.
 
 In a follow on patch, as part of version 2 of the column statistics project, 
 we will add support for computing, persisting and retrieving histograms on 
 long and double typed column values.
 
 Generated Thrift files have been removed for viewing pleasure. JIRA page has 
 the patch with the generated Thrift files.
 
 
 This addresses bug HIVE-1362.
 https://issues.apache.org/jira/browse/HIVE-1362
 
 
 Diffs
 -
 
   data/files/UserVisits.dat PRE-CREATION 
   data/files/binary.txt PRE-CREATION 
   data/files/bool.txt PRE-CREATION 
   data/files/double.txt PRE-CREATION 
   data/files/employee.dat PRE-CREATION 
   data/files/employee2.dat PRE-CREATION 
   data/files/int.txt PRE-CREATION 
   ivy/libraries.properties 7ac6778 
   metastore/if/hive_metastore.thrift d4fad72 
   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 8fec13d 
   
 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
 17b986c 
   metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
 3883b5b 
   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 eff44b1 
   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bf5ae3a 
   metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 77d1caa 
   
 metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
  PRE-CREATION 
   
 metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java
  PRE-CREATION 
   metastore/src/model/package.jdo 38ce6d5 
   
 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
  528a100 
   metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 
 925938d 
   ql/build.xml 5de3f78 
   ql/if/queryplan.thrift 05fbf58 
   ql/ivy.xml aa3b8ce 
   ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 425900d 
   ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 4c8831f 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 4446952 
   ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 79b87f1 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 7440889 
   
 

[jira] [Commented] (HIVE-3501) Track table and keys used in joins and group bys for logging

2012-10-03 Thread Sambavi Muthukrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468662#comment-13468662
 ] 

Sambavi Muthukrishnan commented on HIVE-3501:
-

Thanks Carl. Please let me know if you hit any issues with the tests.

 Track table and keys used in joins and group bys for logging
 

 Key: HIVE-3501
 URL: https://issues.apache.org/jira/browse/HIVE-3501
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Sambavi Muthukrishnan
Assignee: Sambavi Muthukrishnan
Priority: Minor
 Attachments: table_access_keys.1.patch, table_access_keys.2.patch, 
 table_access_keys.3.patch, table_access_keys.4.patch, 
 table_access_keys.5.patch

   Original Estimate: 96h
  Remaining Estimate: 96h

 For all operators that could benefit from bucketing, it will be useful to 
 keep track of and log the table names and key column names in order for the 
 operator to be converted to the bucketed version. This task is to track this 
 information for joins and group bys when the keys can be directly mapped back 
 to table scans and columns on that table. This information will be tracked on 
 the QueryPlan object so it is available to any pre/post execution hooks for 
 logging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1362) column level statistics

2012-10-03 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468668#comment-13468668
 ] 

Shreepadma Venugopalan commented on HIVE-1362:
--

@Namit: The stats is collected as part of analyze. We will look into implicit 
stats collection i.e., when the table is scanned/loaded in next version of this 
project.

 column level statistics
 ---

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
 HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-1362: Support for column statistics in Hive

2012-10-03 Thread Shreepadma Venugopalan


 On Oct. 3, 2012, 11:50 a.m., namit jain wrote:
  Some questions:
  
  How much this interact with hive.stats.reliable ?
  
  There are many places with a TODO
  The formatting needs to be fixed in many places.
  Some functions are returning hashset etc. - they should be changed to 
  return a set instead ?
  Can you make sure you use complete variable names -

This patch doesn't interact in any way with hive.stats.reliable. 

Will rename any variables with shortened name to use the full name.
Will return generic java type instead of ArrayList, HashSet etc.
Will fix formatting.

There is only one place with a real TODO - the implementation of 
Flajolet-Martin Sketch. I was planning to fix the TODO by making the accuracy 
percentage a configurable parameter. The other places with a TODO are auto 
generated code which says TODO - Auto Generated method.


 On Oct. 3, 2012, 11:50 a.m., namit jain wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java, line 52
  https://reviews.apache.org/r/6878/diff/3/?file=173533#file173533line52
 
  can you use full variable name instead of Rwt

Will do.


 On Oct. 3, 2012, 11:50 a.m., namit jain wrote:
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java,
   line 161
  https://reviews.apache.org/r/6878/diff/3/?file=173542#file173542line161
 
  LHS should not be an arraylist
  
  Please fix all such occurences

Will change any LHS occurrences to use Java generic type.


- Shreepadma


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6878/#review12131
---


On Oct. 3, 2012, 3:10 a.m., Shreepadma Venugopalan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/6878/
 ---
 
 (Updated Oct. 3, 2012, 3:10 a.m.)
 
 
 Review request for hive and Carl Steinbach.
 
 
 Description
 ---
 
 This patch implements version 1 of the column statistics project in Hive. It 
 adds support for computing and persisting statistical summary of column 
 values in Hive Tables and Partitions. In order to support column statistics 
 in Hive, this patch does the following,
 
 * Adds a new compute stats UDAF to compute scalar statistics for all 
 primitive Hive data types. In version 1 of the project, we support the 
 following scalar statistics on primitive types - estimate of number of 
 distinct values, number of null values, number of trues/falses for boolean 
 typed columsn, max and avg length for string and binary typed columns, max 
 and min value for long and double typed columns. Note that version 1 of the 
 column stats project includes support for column statistics both at the table 
 and partition level.
 
 * Adds Metastore schema tables to persist the newly added statistics both at 
 table and partition level.
 * Adds Metastore Thrift API to persist, retrieve and delete column statistics 
 at both table and partition level. 
 Please refer to the following wiki link for the details of the schema and the 
 Thrift API changes - 
 https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive
 
 * Extends the analyze table compute statistics statement to trigger 
 statistics computation and persistence for one or more columns. Please note 
 that statistics for multiple columns is computed through a single scan of the 
 table data. Please refer to the following wiki link for the syntax changes - 
 https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive
 
 One thing missing from the patch at this point is the metastore upgrade 
 scrips for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to 
 finalize the metastore schema changes before I go ahead and add the upgrade 
 scripts.
 
 In a follow on patch, as part of version 2 of the column statistics project, 
 we will add support for computing, persisting and retrieving histograms on 
 long and double typed column values.
 
 Generated Thrift files have been removed for viewing pleasure. JIRA page has 
 the patch with the generated Thrift files.
 
 
 This addresses bug HIVE-1362.
 https://issues.apache.org/jira/browse/HIVE-1362
 
 
 Diffs
 -
 
   data/files/UserVisits.dat PRE-CREATION 
   data/files/binary.txt PRE-CREATION 
   data/files/bool.txt PRE-CREATION 
   data/files/double.txt PRE-CREATION 
   data/files/employee.dat PRE-CREATION 
   data/files/employee2.dat PRE-CREATION 
   data/files/int.txt PRE-CREATION 
   ivy/libraries.properties 7ac6778 
   metastore/if/hive_metastore.thrift d4fad72 
   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 8fec13d 
   
 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
 17b986c 
   

[jira] [Commented] (HIVE-3276) optimize union sub-queries

2012-10-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468682#comment-13468682
 ] 

Namit Jain commented on HIVE-3276:
--

The tests finished fine

 optimize union sub-queries
 --

 Key: HIVE-3276
 URL: https://issues.apache.org/jira/browse/HIVE-3276
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3276.10.patch, HIVE-3276.1.patch, 
 hive.3276.2.patch, hive.3276.3.patch, hive.3276.4.patch, hive.3276.5.patch, 
 hive.3276.6.patch, hive.3276.7.patch, hive.3276.8.patch, hive.3276.9.patch


 It might be a good idea to optimize simple union queries containing 
 map-reduce jobs in at least one of the sub-qeuries.
 For eg:
 a query like:
 insert overwrite table T1 partition P1
 select * from 
 (
   subq1
 union all
   subq2
 ) u;
 today creates 3 map-reduce jobs, one for subq1, another for subq2 and 
 the final one for the union. 
 It might be a good idea to optimize this. Instead of creating the union 
 task, it might be simpler to create a move task (or something like a move
 task), where the outputs of the two sub-queries will be moved to the final 
 directory. This can easily extend to more than 2 sub-queries in the union.
 This is very useful if there is a select * followed by filesink after the
 union. This can be independently useful, and also be used to optimize the
 skewed joins https://cwiki.apache.org/Hive/skewed-join-optimization.html.
 If there is a select, filter between the union and the filesink, the select
 and the filter can be moved before the union, and the follow-up job can
 still be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive

2012-10-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3433:
-

Attachment: hive.3433.3.patch

 Implement CUBE and ROLLUP operators in Hive
 ---

 Key: HIVE-3433
 URL: https://issues.apache.org/jira/browse/HIVE-3433
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Sambavi Muthukrishnan
Assignee: Namit Jain
 Attachments: hive.3433.1.patch, hive.3433.2.patch, hive.3433.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive

2012-10-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468684#comment-13468684
 ] 

Namit Jain commented on HIVE-3433:
--

[~shreepadma], thanks - using fastbitset instead.

 Implement CUBE and ROLLUP operators in Hive
 ---

 Key: HIVE-3433
 URL: https://issues.apache.org/jira/browse/HIVE-3433
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Sambavi Muthukrishnan
Assignee: Namit Jain
 Attachments: hive.3433.1.patch, hive.3433.2.patch, hive.3433.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive

2012-10-03 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468686#comment-13468686
 ] 

Shreepadma Venugopalan commented on HIVE-3433:
--

Thanks for making the change, Namit.

 Implement CUBE and ROLLUP operators in Hive
 ---

 Key: HIVE-3433
 URL: https://issues.apache.org/jira/browse/HIVE-3433
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Sambavi Muthukrishnan
Assignee: Namit Jain
 Attachments: hive.3433.1.patch, hive.3433.2.patch, hive.3433.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3514 started by Gang Tim Liu.

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch, HIVE-3514.patch.2


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3514:
---

Attachment: HIVE-3514.patch.2

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch, HIVE-3514.patch.2


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3514:
---

Status: Patch Available  (was: In Progress)

Patch is available on both attachment and D5727. thanks

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch, HIVE-3514.patch.2


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3498) hivetest.py fails with --revision option

2012-10-03 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3498:


   Resolution: Fixed
Fix Version/s: 0.10.0
   Status: Resolved  (was: Patch Available)

Committed, thanks Ivan.

 hivetest.py fails with --revision option
 

 Key: HIVE-3498
 URL: https://issues.apache.org/jira/browse/HIVE-3498
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Ivan Gorbachev
Assignee: Ivan Gorbachev
  Labels: testing
 Fix For: 0.10.0

 Attachments: jira-3498.0.patch


 How to reproduce outside hivetest.py:
 1. Clone git://git.apache.org/hive.git
 2. Run ant arc-setup
 3. Run arc patch rev
 Output:
 {quote}
 This diff is against commit
 https://svn.apache.org/repos/asf/hive/trunk@1382631, but the commit is
 nowhere in the working copy. Try to apply it against the current working
 copy state? (d5f66df1edfff2645f225298e225dbccc70d97ff) [Y/n] 
 {quote}
 If you choose 'Y' it suggests you to complete 'merge-message' and then prints:
 {quote}
  Select a Default Commit Range
 You're running a command which operates on a range of revisions (usually,
 from some revision to HEAD) but have not specified the revision that should
 determine the start of the range.
 Previously, arc assumed you meant 'HEAD^' when you did not specify a start
 revision, but this behavior does not make much sense in most workflows
 outside of Facebook's historic git-svn workflow.
 arc no longer assumes 'HEAD^'. You must specify a relative commit explicitly
 when you invoke a command (e.g., `arc diff HEAD^`, not just `arc diff`) or
 select a default for this working copy.
 In most cases, the best default is 'origin/master'. You can also select
 'HEAD^' to preserve the old behavior, or some other remote or branch. But you
 almost certainly want to select 'origin/master'.
 (Technically: the merge-base of the selected revision and HEAD is used to
 determine the start of the commit range.)
 What default do you want to use? [origin/master]
 {quote}
 There isn't the same behavior for svn checkout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-1362: Support for column statistics in Hive

2012-10-03 Thread Shreepadma Venugopalan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6878/
---

(Updated Oct. 3, 2012, 7:16 p.m.)


Review request for hive and Carl Steinbach.


Changes
---

This revision addresses the review comments from revision#3, particularly the 
following,

* Fixes the TODOs. There is still one outstanding TODO - make the accuracy a 
user provided parameter for Flajolet-Martin sketch in 
NumDisinctValueEstimator.java
* Fixes the formatting
* Uses java generics on LHS except in StatsSemanticAnalyzer.java. 
StatsSemanticAnalyzer.java inherits from BaseSemanticAnalyzer.java and one of 
methods StatsSemanticAnalyzer over rides from BaseSemanticAnalyzer returns a 
HashSet instead of a Set. This patch doesn't use generics on the LHS in this 
particular instance. This is beyond the scope of this JIRA, will be happy to do 
it as part of a cleanup JIRA.
* Replaces shortened variable names with long variable names


Description
---

This patch implements version 1 of the column statistics project in Hive. It 
adds support for computing and persisting statistical summary of column values 
in Hive Tables and Partitions. In order to support column statistics in Hive, 
this patch does the following,

* Adds a new compute stats UDAF to compute scalar statistics for all primitive 
Hive data types. In version 1 of the project, we support the following scalar 
statistics on primitive types - estimate of number of distinct values, number 
of null values, number of trues/falses for boolean typed columsn, max and avg 
length for string and binary typed columns, max and min value for long and 
double typed columns. Note that version 1 of the column stats project includes 
support for column statistics both at the table and partition level.

* Adds Metastore schema tables to persist the newly added statistics both at 
table and partition level.
* Adds Metastore Thrift API to persist, retrieve and delete column statistics 
at both table and partition level. 
Please refer to the following wiki link for the details of the schema and the 
Thrift API changes - 
https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive

* Extends the analyze table compute statistics statement to trigger statistics 
computation and persistence for one or more columns. Please note that 
statistics for multiple columns is computed through a single scan of the table 
data. Please refer to the following wiki link for the syntax changes - 
https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive

One thing missing from the patch at this point is the metastore upgrade scrips 
for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to finalize the 
metastore schema changes before I go ahead and add the upgrade scripts.

In a follow on patch, as part of version 2 of the column statistics project, we 
will add support for computing, persisting and retrieving histograms on long 
and double typed column values.

Generated Thrift files have been removed for viewing pleasure. JIRA page has 
the patch with the generated Thrift files.


This addresses bug HIVE-1362.
https://issues.apache.org/jira/browse/HIVE-1362


Diffs (updated)
-

  data/files/UserVisits.dat PRE-CREATION 
  data/files/binary.txt PRE-CREATION 
  data/files/bool.txt PRE-CREATION 
  data/files/double.txt PRE-CREATION 
  data/files/employee.dat PRE-CREATION 
  data/files/employee2.dat PRE-CREATION 
  data/files/int.txt PRE-CREATION 
  ivy/libraries.properties 7ac6778 
  metastore/if/hive_metastore.thrift d4fad72 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
8fec13d 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
17b986c 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
3883b5b 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java eff44b1 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bf5ae3a 
  metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 77d1caa 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
 PRE-CREATION 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java
 PRE-CREATION 
  metastore/src/model/package.jdo 38ce6d5 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 528a100 
  metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 
925938d 
  ql/build.xml 5de3f78 
  ql/if/queryplan.thrift 05fbf58 
  ql/ivy.xml aa3b8ce 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 425900d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 4446952 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 79b87f1 
  

Re: Review Request: HIVE-1362: Support for column statistics in Hive

2012-10-03 Thread Shreepadma Venugopalan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6878/
---

(Updated Oct. 3, 2012, 7:16 p.m.)


Review request for hive and Carl Steinbach.


Changes
---

This revision addresses the review comments from revision#3, particularly the 
following,

* Fixes the TODOs. There is still one outstanding TODO - make the accuracy a 
user provided parameter for Flajolet-Martin sketch in 
NumDisinctValueEstimator.java
* Fixes the formatting
* Uses java generics on LHS except in StatsSemanticAnalyzer.java. 
StatsSemanticAnalyzer.java inherits from BaseSemanticAnalyzer.java and one of 
methods StatsSemanticAnalyzer over rides from BaseSemanticAnalyzer returns a 
HashSet instead of a Set. This patch doesn't use generics on the LHS in this 
particular instance. This is beyond the scope of this JIRA, will be happy to do 
it as part of a cleanup JIRA.
* Replaces shortened variable names with long variable names


Description
---

This patch implements version 1 of the column statistics project in Hive. It 
adds support for computing and persisting statistical summary of column values 
in Hive Tables and Partitions. In order to support column statistics in Hive, 
this patch does the following,

* Adds a new compute stats UDAF to compute scalar statistics for all primitive 
Hive data types. In version 1 of the project, we support the following scalar 
statistics on primitive types - estimate of number of distinct values, number 
of null values, number of trues/falses for boolean typed columsn, max and avg 
length for string and binary typed columns, max and min value for long and 
double typed columns. Note that version 1 of the column stats project includes 
support for column statistics both at the table and partition level.

* Adds Metastore schema tables to persist the newly added statistics both at 
table and partition level.
* Adds Metastore Thrift API to persist, retrieve and delete column statistics 
at both table and partition level. 
Please refer to the following wiki link for the details of the schema and the 
Thrift API changes - 
https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive

* Extends the analyze table compute statistics statement to trigger statistics 
computation and persistence for one or more columns. Please note that 
statistics for multiple columns is computed through a single scan of the table 
data. Please refer to the following wiki link for the syntax changes - 
https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive

One thing missing from the patch at this point is the metastore upgrade scrips 
for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to finalize the 
metastore schema changes before I go ahead and add the upgrade scripts.

In a follow on patch, as part of version 2 of the column statistics project, we 
will add support for computing, persisting and retrieving histograms on long 
and double typed column values.

Generated Thrift files have been removed for viewing pleasure. JIRA page has 
the patch with the generated Thrift files.


This addresses bug HIVE-1362.
https://issues.apache.org/jira/browse/HIVE-1362


Diffs
-

  data/files/UserVisits.dat PRE-CREATION 
  data/files/binary.txt PRE-CREATION 
  data/files/bool.txt PRE-CREATION 
  data/files/double.txt PRE-CREATION 
  data/files/employee.dat PRE-CREATION 
  data/files/employee2.dat PRE-CREATION 
  data/files/int.txt PRE-CREATION 
  ivy/libraries.properties 7ac6778 
  metastore/if/hive_metastore.thrift d4fad72 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
8fec13d 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
17b986c 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
3883b5b 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java eff44b1 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bf5ae3a 
  metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 77d1caa 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
 PRE-CREATION 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java
 PRE-CREATION 
  metastore/src/model/package.jdo 38ce6d5 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 528a100 
  metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 
925938d 
  ql/build.xml 5de3f78 
  ql/if/queryplan.thrift 05fbf58 
  ql/ivy.xml aa3b8ce 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 425900d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 4446952 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 79b87f1 
  

[jira] [Updated] (HIVE-1362) column level statistics

2012-10-03 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-1362:
-

Status: Patch Available  (was: Open)

 column level statistics
 ---

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
 HIVE-1362.3.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, 
 HIVE-1362-gen_thrift.2.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1362) column level statistics

2012-10-03 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-1362:
-

Attachment: HIVE-1362.3.patch.txt

 column level statistics
 ---

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
 HIVE-1362.3.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, 
 HIVE-1362-gen_thrift.2.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1362) column level statistics

2012-10-03 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468754#comment-13468754
 ] 

Shreepadma Venugopalan commented on HIVE-1362:
--

Latest revision which addresses Namit's comments is on review board.

 column level statistics
 ---

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
 HIVE-1362.3.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, 
 HIVE-1362-gen_thrift.2.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1362) column level statistics

2012-10-03 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-1362:
-

Attachment: HIVE-1362-gen_thrift.3.patch.txt

 column level statistics
 ---

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
 HIVE-1362.3.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, 
 HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3522) Make separator for Entity name configurable

2012-10-03 Thread Raghotham Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghotham Murthy updated HIVE-3522:
---

Attachment: hive-3522.1.patch

 Make separator for Entity name configurable
 ---

 Key: HIVE-3522
 URL: https://issues.apache.org/jira/browse/HIVE-3522
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Raghotham Murthy
Assignee: Raghotham Murthy
Priority: Trivial
 Attachments: hive-3522.1.patch


 Right now its hard-coded to '@'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3522) Make separator for Entity name configurable

2012-10-03 Thread Raghotham Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghotham Murthy updated HIVE-3522:
---

Status: Patch Available  (was: In Progress)

 Make separator for Entity name configurable
 ---

 Key: HIVE-3522
 URL: https://issues.apache.org/jira/browse/HIVE-3522
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Raghotham Murthy
Assignee: Raghotham Murthy
Priority: Trivial
 Attachments: hive-3522.1.patch


 Right now its hard-coded to '@'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #157

2012-10-03 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/157/

--
[...truncated 36610 lines...]
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2012-10-03_13-48-40_623_4031868789239429849/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201210031348_928690341.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt
[junit] Copying file: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/jenkins/hive_2012-10-03_13-48-46_620_6433981211820663306/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2012-10-03_13-48-46_620_6433981211820663306/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201210031348_424709819.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201210031348_1052999737.txt
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201210031348_252025720.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: 

[jira] [Created] (HIVE-3526) Column Statistics - Add support for equi-height histograms on numeric columns

2012-10-03 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-3526:


 Summary: Column Statistics - Add support for equi-height 
histograms on numeric columns
 Key: HIVE-3526
 URL: https://issues.apache.org/jira/browse/HIVE-3526
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan


This JIRA covers the task of adding support for equi-height histograms on 
numeric columns in Hive tables and partitions. This task involves a) 
implementing a UDAF to compute equi-height histograms on numeric columns, b) 
persisting the histogram to the metastore along with other column statistics , 
c) enhancing the thrift api to retrieve the histogram along with the other 
statistics and d) extending the grammar of ANALYZE to allow the user to request 
histograms and specify the number of bins.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3525) Avro Maps with Nullable Values fail with NPE

2012-10-03 Thread Sean Busbey (JIRA)
Sean Busbey created HIVE-3525:
-

 Summary: Avro Maps with Nullable Values fail with NPE
 Key: HIVE-3525
 URL: https://issues.apache.org/jira/browse/HIVE-3525
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey


When working against current trunk@1393794, using a backing Avro schema that 
has a Map field with nullable values causes a NPE on deserialization when the 
map contains a null value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3526) Column Statistics - Add support for equi-height histograms on numeric columns

2012-10-03 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468905#comment-13468905
 ] 

Shreepadma Venugopalan commented on HIVE-3526:
--

HIVE-1362 covers the task of adding support for column level statistics in Hive.

 Column Statistics - Add support for equi-height histograms on numeric columns
 -

 Key: HIVE-3526
 URL: https://issues.apache.org/jira/browse/HIVE-3526
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 This JIRA covers the task of adding support for equi-height histograms on 
 numeric columns in Hive tables and partitions. This task involves a) 
 implementing a UDAF to compute equi-height histograms on numeric columns, b) 
 persisting the histogram to the metastore along with other column statistics 
 , c) enhancing the thrift api to retrieve the histogram along with the other 
 statistics and d) extending the grammar of ANALYZE to allow the user to 
 request histograms and specify the number of bins.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3525) Avro Maps with Nullable Values fail with NPE

2012-10-03 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HIVE-3525:
--

Attachment: HIVE-3525.1.patch.txt

Patch with unit tests that expresses the NPE on deserialization and during the 
roundtrip for serialization.

Also shows that the object inspector is behaving correctly.

 Avro Maps with Nullable Values fail with NPE
 

 Key: HIVE-3525
 URL: https://issues.apache.org/jira/browse/HIVE-3525
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
 Attachments: HIVE-3525.1.patch.txt


 When working against current trunk@1393794, using a backing Avro schema that 
 has a Map field with nullable values causes a NPE on deserialization when the 
 map contains a null value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3525) Avro Maps with Nullable Values fail with NPE

2012-10-03 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468907#comment-13468907
 ] 

Shreepadma Venugopalan commented on HIVE-3525:
--

@Sean: Can you post a review request on review board or on phabricator? Thanks.

 Avro Maps with Nullable Values fail with NPE
 

 Key: HIVE-3525
 URL: https://issues.apache.org/jira/browse/HIVE-3525
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
 Attachments: HIVE-3525.1.patch.txt


 When working against current trunk@1393794, using a backing Avro schema that 
 has a Map field with nullable values causes a NPE on deserialization when the 
 map contains a null value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3525) Avro Maps with Nullable Values fail with NPE

2012-10-03 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468910#comment-13468910
 ] 

Sean Busbey commented on HIVE-3525:
---

It looks like this is because the Avro SerDe uses a Hashtable when reading out 
Avro Maps. The BinarySortableSerDe uses HashMap, so presumably it could as well.

 Avro Maps with Nullable Values fail with NPE
 

 Key: HIVE-3525
 URL: https://issues.apache.org/jira/browse/HIVE-3525
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
 Attachments: HIVE-3525.1.patch.txt


 When working against current trunk@1393794, using a backing Avro schema that 
 has a Map field with nullable values causes a NPE on deserialization when the 
 map contains a null value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3525) Avro Maps with Nullable Values fail with NPE

2012-10-03 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468914#comment-13468914
 ] 

Sean Busbey commented on HIVE-3525:
---

[~shreepadma] Sure thing. Should I wait till the patch contains a solution, or 
just while it's still the tests?

 Avro Maps with Nullable Values fail with NPE
 

 Key: HIVE-3525
 URL: https://issues.apache.org/jira/browse/HIVE-3525
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
 Attachments: HIVE-3525.1.patch.txt


 When working against current trunk@1393794, using a backing Avro schema that 
 has a Map field with nullable values causes a NPE on deserialization when the 
 map contains a null value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3525) Avro Maps with Nullable Values fail with NPE

2012-10-03 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468915#comment-13468915
 ] 

Shreepadma Venugopalan commented on HIVE-3525:
--

It looks like you have a patch attached to the JIRA page. Is this a work in 
progress patch? Is this something you would like us to review? Its a lot easier 
to perform the review on phabricator/reviewboard. 

 Avro Maps with Nullable Values fail with NPE
 

 Key: HIVE-3525
 URL: https://issues.apache.org/jira/browse/HIVE-3525
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
 Attachments: HIVE-3525.1.patch.txt


 When working against current trunk@1393794, using a backing Avro schema that 
 has a Map field with nullable values causes a NPE on deserialization when the 
 map contains a null value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3437) 0.23 compatibility: fix unit tests when building against 0.23

2012-10-03 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-3437:
--

Attachment: HIVE-3437-trunk-3.patch
HIVE-3437-0.9-3.patch

Updates that adrees reviewboard comments. Fix that gets NegativeMinimrCliDriver 
tests working with Hadoop 0.23.3.

 0.23 compatibility: fix unit tests when building against 0.23
 -

 Key: HIVE-3437
 URL: https://issues.apache.org/jira/browse/HIVE-3437
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.9.0, 0.10.0
Reporter: Chris Drome
Assignee: Chris Drome
 Fix For: 0.9.0, 0.10.0

 Attachments: HIVE-3437-0.9-1.patch, HIVE-3437-0.9-2.patch, 
 HIVE-3437-0.9-3.patch, HIVE-3437-0.9.patch, HIVE-3437-trunk-1.patch, 
 HIVE-3437-trunk-2.patch, HIVE-3437-trunk-3.patch, HIVE-3437-trunk.patch


 Many unit tests fail as a result of building the code against hadoop 0.23. 
 Initial focus will be to fix 0.9.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3527) Allow CREATE TABLE LIKE command to take TBLPROPERTIES

2012-10-03 Thread Kevin Wilfong (JIRA)
Kevin Wilfong created HIVE-3527:
---

 Summary: Allow CREATE TABLE LIKE command to take TBLPROPERTIES
 Key: HIVE-3527
 URL: https://issues.apache.org/jira/browse/HIVE-3527
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong


CREATE TABLE ... LIKE ... commands currently don't take TBLPROPERTIES.  I think 
it would be a useful feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3527) Allow CREATE TABLE LIKE command to take TBLPROPERTIES

2012-10-03 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468934#comment-13468934
 ] 

Kevin Wilfong commented on HIVE-3527:
-

https://reviews.facebook.net/D5847

 Allow CREATE TABLE LIKE command to take TBLPROPERTIES
 -

 Key: HIVE-3527
 URL: https://issues.apache.org/jira/browse/HIVE-3527
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3527.1.patch.txt


 CREATE TABLE ... LIKE ... commands currently don't take TBLPROPERTIES.  I 
 think it would be a useful feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3527) Allow CREATE TABLE LIKE command to take TBLPROPERTIES

2012-10-03 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3527:


Status: Patch Available  (was: Open)

 Allow CREATE TABLE LIKE command to take TBLPROPERTIES
 -

 Key: HIVE-3527
 URL: https://issues.apache.org/jira/browse/HIVE-3527
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3527.1.patch.txt


 CREATE TABLE ... LIKE ... commands currently don't take TBLPROPERTIES.  I 
 think it would be a useful feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3527) Allow CREATE TABLE LIKE command to take TBLPROPERTIES

2012-10-03 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3527:


Attachment: HIVE-3527.1.patch.txt

 Allow CREATE TABLE LIKE command to take TBLPROPERTIES
 -

 Key: HIVE-3527
 URL: https://issues.apache.org/jira/browse/HIVE-3527
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3527.1.patch.txt


 CREATE TABLE ... LIKE ... commands currently don't take TBLPROPERTIES.  I 
 think it would be a useful feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: Unit tests for reproducing HIVE-3525

2012-10-03 Thread Sean Busbey

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7430/
---

Review request for hive.


Description
---

Unit test reproducing HIVE-3525


Diffs
-

  
/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java
 1393794 
  
/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java
 1393794 
  
/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java
 1393794 

Diff: https://reviews.apache.org/r/7430/diff/


Testing
---

Run additional tests after patching against trunk. Uses an Avro Schema that has 
a single field which is a Map that allows null values. Object Inspector 
properly hides the union with null, but the deserializer can't actually handle 
null values.


Thanks,

Sean Busbey



Review Request: Unit tests to show failure to handle nullable complex types on serialization

2012-10-03 Thread Sean Busbey

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7431/
---

Review request for hive.


Description
---

Tests that express AvroSerDe's erroneous handling of Nullable complex types on 
serialization


Diffs
-

  
/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java
 1393805 

Diff: https://reviews.apache.org/r/7431/diff/


Testing
---

Adds 7 tests that check each of the Avro types that Serialization needs to use 
a user-provided schema to handle.


Thanks,

Sean Busbey



[jira] [Updated] (HIVE-3467) BucketMapJoinOptimizer should optimize joins on partition columns

2012-10-03 Thread Zhenxiao Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo updated HIVE-3467:
---

Attachment: HIVE-3467.2.patch.txt

 BucketMapJoinOptimizer should optimize joins on partition columns
 -

 Key: HIVE-3467
 URL: https://issues.apache.org/jira/browse/HIVE-3467
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Zhenxiao Luo
 Attachments: HIVE-3467.1.patch.txt, HIVE-3467.2.patch.txt


 Consider the query:
 SELECT * FROM t1 JOIN t2 on t1.part = t2.part and t1.key = t2.key;
 Where t1 and t2 are partitioned by part and bucketed by key.
 Suppose part take values 1 and 2 and t1 and t2 are bucketed into 2 buckets.
 The bucket map join optimizer will put the first bucket of part=1 and part=2 
 partitions of t2 into the same mapper as that of part=1 partition of t1.  It 
 will do the same for the part=2 partition of t1.
 It could take advantage of the partition values and send the first bucket of 
 only the part=1 partitions of t1 and t2 into one mapper and the first bucket 
 of only the part=2 partitions into another.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3467) BucketMapJoinOptimizer should optimize joins on partition columns

2012-10-03 Thread Zhenxiao Luo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468946#comment-13468946
 ] 

Zhenxiao Luo commented on HIVE-3467:


Comments addressed. Review request resubmitted at:
https://reviews.facebook.net/D5769

 BucketMapJoinOptimizer should optimize joins on partition columns
 -

 Key: HIVE-3467
 URL: https://issues.apache.org/jira/browse/HIVE-3467
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Zhenxiao Luo
 Attachments: HIVE-3467.1.patch.txt, HIVE-3467.2.patch.txt


 Consider the query:
 SELECT * FROM t1 JOIN t2 on t1.part = t2.part and t1.key = t2.key;
 Where t1 and t2 are partitioned by part and bucketed by key.
 Suppose part take values 1 and 2 and t1 and t2 are bucketed into 2 buckets.
 The bucket map join optimizer will put the first bucket of part=1 and part=2 
 partitions of t2 into the same mapper as that of part=1 partition of t1.  It 
 will do the same for the part=2 partition of t1.
 It could take advantage of the partition values and send the first bucket of 
 only the part=1 partitions of t1 and t2 into one mapper and the first bucket 
 of only the part=2 partitions into another.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3467) BucketMapJoinOptimizer should optimize joins on partition columns

2012-10-03 Thread Zhenxiao Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo updated HIVE-3467:
---

Status: Patch Available  (was: Open)

 BucketMapJoinOptimizer should optimize joins on partition columns
 -

 Key: HIVE-3467
 URL: https://issues.apache.org/jira/browse/HIVE-3467
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Zhenxiao Luo
 Attachments: HIVE-3467.1.patch.txt, HIVE-3467.2.patch.txt


 Consider the query:
 SELECT * FROM t1 JOIN t2 on t1.part = t2.part and t1.key = t2.key;
 Where t1 and t2 are partitioned by part and bucketed by key.
 Suppose part take values 1 and 2 and t1 and t2 are bucketed into 2 buckets.
 The bucket map join optimizer will put the first bucket of part=1 and part=2 
 partitions of t2 into the same mapper as that of part=1 partition of t1.  It 
 will do the same for the part=2 partition of t1.
 It could take advantage of the partition values and send the first bucket of 
 only the part=1 partitions of t1 and t2 into one mapper and the first bucket 
 of only the part=2 partitions into another.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema

2012-10-03 Thread Sean Busbey (JIRA)
Sean Busbey created HIVE-3528:
-

 Summary: Avro SerDe doesn't handle serializing Nullable types that 
require access to a Schema
 Key: HIVE-3528
 URL: https://issues.apache.org/jira/browse/HIVE-3528
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey


Deserialization properly handles hiding Nullable Avro types, including complex 
types like record, map, array, etc. However, when Serialization attempts to 
write out these types it erroneously makes use of the UNION schema that 
contains NULL and the other type.

This results in Schema mis-match errors for Record, Array, Enum, Fixed, and 
Bytes.

Here's a [review board of unit tests that express the 
problem|https://reviews.apache.org/r/7431/], as well as one that supports the 
case that it's only when the schema is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3498) hivetest.py fails with --revision option

2012-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468965#comment-13468965
 ] 

Hudson commented on HIVE-3498:
--

Integrated in Hive-trunk-h0.21 #1719 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1719/])
HIVE-3498. hivetest.py fails with --revision option. (Ivan Gorbachev via 
kevinwilfong) (Revision 1393676)

 Result = FAILURE
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1393676
Files : 
* /hive/trunk/testutils/ptest/hivetest.py


 hivetest.py fails with --revision option
 

 Key: HIVE-3498
 URL: https://issues.apache.org/jira/browse/HIVE-3498
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Ivan Gorbachev
Assignee: Ivan Gorbachev
  Labels: testing
 Fix For: 0.10.0

 Attachments: jira-3498.0.patch


 How to reproduce outside hivetest.py:
 1. Clone git://git.apache.org/hive.git
 2. Run ant arc-setup
 3. Run arc patch rev
 Output:
 {quote}
 This diff is against commit
 https://svn.apache.org/repos/asf/hive/trunk@1382631, but the commit is
 nowhere in the working copy. Try to apply it against the current working
 copy state? (d5f66df1edfff2645f225298e225dbccc70d97ff) [Y/n] 
 {quote}
 If you choose 'Y' it suggests you to complete 'merge-message' and then prints:
 {quote}
  Select a Default Commit Range
 You're running a command which operates on a range of revisions (usually,
 from some revision to HEAD) but have not specified the revision that should
 determine the start of the range.
 Previously, arc assumed you meant 'HEAD^' when you did not specify a start
 revision, but this behavior does not make much sense in most workflows
 outside of Facebook's historic git-svn workflow.
 arc no longer assumes 'HEAD^'. You must specify a relative commit explicitly
 when you invoke a command (e.g., `arc diff HEAD^`, not just `arc diff`) or
 select a default for this working copy.
 In most cases, the best default is 'origin/master'. You can also select
 'HEAD^' to preserve the old behavior, or some other remote or branch. But you
 almost certainly want to select 'origin/master'.
 (Technically: the merge-base of the selected revision and HEAD is used to
 determine the start of the commit range.)
 What default do you want to use? [origin/master]
 {quote}
 There isn't the same behavior for svn checkout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3522) Make separator for Entity name configurable

2012-10-03 Thread Raghotham Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghotham Murthy updated HIVE-3522:
---

Attachment: hive-3522.2.patch

 Make separator for Entity name configurable
 ---

 Key: HIVE-3522
 URL: https://issues.apache.org/jira/browse/HIVE-3522
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Raghotham Murthy
Assignee: Raghotham Murthy
Priority: Trivial
 Attachments: hive-3522.1.patch, hive-3522.2.patch


 Right now its hard-coded to '@'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3522) Make separator for Entity name configurable

2012-10-03 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469017#comment-13469017
 ] 

Kevin Wilfong commented on HIVE-3522:
-

+1 Looks good.

 Make separator for Entity name configurable
 ---

 Key: HIVE-3522
 URL: https://issues.apache.org/jira/browse/HIVE-3522
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Raghotham Murthy
Assignee: Raghotham Murthy
Priority: Trivial
 Attachments: hive-3522.1.patch, hive-3522.2.patch


 Right now its hard-coded to '@'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3529) Incorrect partition bucket/sort metadata when overwriting partition with different metadata from table

2012-10-03 Thread Kevin Wilfong (JIRA)
Kevin Wilfong created HIVE-3529:
---

 Summary: Incorrect partition bucket/sort metadata when overwriting 
partition with different metadata from table
 Key: HIVE-3529
 URL: https://issues.apache.org/jira/browse/HIVE-3529
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong


If you have a partition with bucket/sort metadata set, then you alter the table 
to have different bucket/sort metadata, and insert overwrite the partition with 
hive.enforce.bucketing=true and/or hive.enforce.sorting=true, the partition 
data will be bucketed/sorted by the table's metadata, but the partition will 
have the same metadata.

This could result in wrong results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3518) QTestUtil side-effects

2012-10-03 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3518:
--

Attachment: HIVE-3518.D5865.1.patch

navis requested code review of HIVE-3518 [jira] QTestUtil side-effects.
Reviewers: JIRA

  DPAL-1907 QTestUtil side-effects

  It seems that QTestUtil has side-effects. This test (metadata_export_drop.q) 
causes failure of other tests on cleanup stage:

  Exception: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
Relative path in absolute URI: 
file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
  org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path 
in absolute URI: 
file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
  at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:845)
  at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:821)
  at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:445)
  at org.apache.hadoop.hive.ql.QTestUtil.shutdown(QTestUtil.java:300)
  at org.apache.hadoop.hive.cli.TestCliDriver.tearDown(TestCliDriver.java:87)
  at junit.framework.TestCase.runBare(TestCase.java:140)
  at junit.framework.TestResult$1.protect(TestResult.java:110)
  at junit.framework.TestResult.runProtected(TestResult.java:128)
  at junit.framework.TestResult.run(TestResult.java:113)
  at junit.framework.TestCase.run(TestCase.java:124)
  at junit.framework.TestSuite.runTest(TestSuite.java:232)
  at junit.framework.TestSuite.run(TestSuite.java:227)
  at 
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
  at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
  at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
  at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
  at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
  at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
  Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
Relative path in absolute URI: 
file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
  at org.apache.hadoop.fs.Path.initialize(Path.java:140)
  at org.apache.hadoop.fs.Path.init(Path.java:132)
  at 
org.apache.hadoop.fs.ProxyFileSystem.swizzleParamPath(ProxyFileSystem.java:56)
  at org.apache.hadoop.fs.ProxyFileSystem.mkdirs(ProxyFileSystem.java:214)
  at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
  at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1120)
  at 
org.apache.hadoop.hive.ql.parse.MetaDataExportListener.export_meta_data(MetaDataExportListener.java:81)
  at 
org.apache.hadoop.hive.ql.parse.MetaDataExportListener.onEvent(MetaDataExportListener.java:106)
  at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1024)
  at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table(HiveMetaStore.java:1185)
  at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:566)
  at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:839)
  ... 17 more
  Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
  at java.net.URI.checkPath(URI.java:1787)
  at java.net.URI.init(URI.java:735)
  at org.apache.hadoop.fs.Path.initialize(Path.java:137)
  ... 28 more

  Flushing 'hive.metastore.pre.event.listeners' into empty string solves the 
issue. During debugging I figured out this property wan't cleaned for other 
tests after it was set in metadata_export_drop.q.

  How to reproduce:

  ant test -Dtestcase=TestCliDriver -Dqfile=metadata_export_drop.q,some test.q

  where some test.q means any test which contains CREATE statement. For 
example, sample10.q

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D5865

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/processors/ResetProcessor.java
  ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/13893/

To: JIRA, navis


 QTestUtil side-effects
 --

 Key: HIVE-3518
 URL: https://issues.apache.org/jira/browse/HIVE-3518
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure, Tests
Reporter: Ivan Gorbachev
 Attachments: HIVE-3518.D5865.1.patch, metadata_export_drop.q


 It seems that QTestUtil has side-effects. This test 
 ([^metadata_export_drop.q]) causes failure of 

[jira] [Commented] (HIVE-3518) QTestUtil side-effects

2012-10-03 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469066#comment-13469066
 ] 

Navis commented on HIVE-3518:
-

QTestUtil creates new HiveConf per test for removing side effects but it's not 
propagated to entities like SessionState or MetaStoreClient. The patch is 
fixing it and not yet tested. After that I'll mark this patch-available.

 QTestUtil side-effects
 --

 Key: HIVE-3518
 URL: https://issues.apache.org/jira/browse/HIVE-3518
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure, Tests
Reporter: Ivan Gorbachev
 Attachments: HIVE-3518.D5865.1.patch, metadata_export_drop.q


 It seems that QTestUtil has side-effects. This test 
 ([^metadata_export_drop.q]) causes failure of other tests on cleanup stage:
 {quote}
 Exception: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
 Relative path in absolute URI: 
 file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
 path in absolute URI: 
 file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
 at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:845)
 at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:821)
 at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:445)
 at org.apache.hadoop.hive.ql.QTestUtil.shutdown(QTestUtil.java:300)
 at org.apache.hadoop.hive.cli.TestCliDriver.tearDown(TestCliDriver.java:87)
 at junit.framework.TestCase.runBare(TestCase.java:140)
 at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
 at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
 at junit.framework.TestSuite.runTest(TestSuite.java:232)
 at junit.framework.TestSuite.run(TestSuite.java:227)
 at 
 org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
 at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
 Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
 Relative path in absolute URI: 
 file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
 at org.apache.hadoop.fs.Path.initialize(Path.java:140)
 at org.apache.hadoop.fs.Path.init(Path.java:132)
 at 
 org.apache.hadoop.fs.ProxyFileSystem.swizzleParamPath(ProxyFileSystem.java:56)
 at org.apache.hadoop.fs.ProxyFileSystem.mkdirs(ProxyFileSystem.java:214)
 at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
 at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1120)
 at 
 org.apache.hadoop.hive.ql.parse.MetaDataExportListener.export_meta_data(MetaDataExportListener.java:81)
 at 
 org.apache.hadoop.hive.ql.parse.MetaDataExportListener.onEvent(MetaDataExportListener.java:106)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1024)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table(HiveMetaStore.java:1185)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:566)
 at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:839)
 ... 17 more
 Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
 file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
 at java.net.URI.checkPath(URI.java:1787)
 at java.net.URI.init(URI.java:735)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 ... 28 more
 {quote}
 Flushing 'hive.metastore.pre.event.listeners' into empty string solves the 
 issue. During debugging I figured out this property wan't cleaned for other 
 tests after it was set in metadata_export_drop.q.
 How to reproduce:
 {code} ant test -Dtestcase=TestCliDriver -Dqfile=metadata_export_drop.q,some 
 test.q{code}
 where some test.q means any test which contains CREATE statement. For 
 example, sample10.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3529) Incorrect partition bucket/sort metadata when overwriting partition with different metadata from table

2012-10-03 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469067#comment-13469067
 ] 

Kevin Wilfong commented on HIVE-3529:
-

My proposed fix is to by default always overwrite the partition's 
bucket/sorting metadata with that of the table when overwriting a table.  My 
main motivation for doing this vs. using the partition's metadata is dynamic 
partitions.  The potential for having to manage maintaining all the different 
bucket/sorting schemes across several partitions which are overwritten 
dynamically sounds like a new feature rather than a bug fix, and could be done 
in a separate JIRA.

 Incorrect partition bucket/sort metadata when overwriting partition with 
 different metadata from table
 --

 Key: HIVE-3529
 URL: https://issues.apache.org/jira/browse/HIVE-3529
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong

 If you have a partition with bucket/sort metadata set, then you alter the 
 table to have different bucket/sort metadata, and insert overwrite the 
 partition with hive.enforce.bucketing=true and/or hive.enforce.sorting=true, 
 the partition data will be bucketed/sorted by the table's metadata, but the 
 partition will have the same metadata.
 This could result in wrong results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3036) hive should support BigDecimal datatype

2012-10-03 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-3036:
-

Component/s: Types

 hive should support BigDecimal datatype
 ---

 Key: HIVE-3036
 URL: https://issues.apache.org/jira/browse/HIVE-3036
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor, Types
Affects Versions: 0.7.1, 0.8.0, 0.8.1
Reporter: Anurag Tangri
 Fix For: 0.10.0


 hive has support for big int but people have use cases where they need 
 decimal precision to a big value.
 Values in question are like decimal(x,y).
 for eg. decimal of form (17,6) which cannot be represented by float/double.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1977) DESCRIBE TABLE syntax doesn't support specifying a database qualified table name

2012-10-03 Thread Zhenxiao Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo updated HIVE-1977:
---

Attachment: HIVE-1977.2.patch.txt

 DESCRIBE TABLE syntax doesn't support specifying a database qualified table 
 name
 

 Key: HIVE-1977
 URL: https://issues.apache.org/jira/browse/HIVE-1977
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema, Query Processor, SQL
Reporter: Carl Steinbach
Assignee: Zhenxiao Luo
 Attachments: HIVE-1977.1.patch.txt, HIVE-1977.2.patch.txt


 The syntax for DESCRIBE is broken. It should be:
 {code}
 DESCRIBE [EXTENDED] [database DOT]table [column]
 {code}
 but is actually
 {code}
 DESCRIBE [EXTENDED] table[DOT col_name]
 {code}
 Ref: http://dev.mysql.com/doc/refman/5.0/en/describe.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1977) DESCRIBE TABLE syntax doesn't support specifying a database qualified table name

2012-10-03 Thread Zhenxiao Luo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469096#comment-13469096
 ] 

Zhenxiao Luo commented on HIVE-1977:


@Namit: Thanks for your comments. I updated the patch, did the following:

1. Instead of adding a new conf, try database.table first, if not valid(via 
tableValidCheck and databaseValidCheck), try table.column.
2. get rid of isStandardSyntax, re-work the code

Review request submitted at:
https://reviews.facebook.net/D5763

 DESCRIBE TABLE syntax doesn't support specifying a database qualified table 
 name
 

 Key: HIVE-1977
 URL: https://issues.apache.org/jira/browse/HIVE-1977
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema, Query Processor, SQL
Reporter: Carl Steinbach
Assignee: Zhenxiao Luo
 Attachments: HIVE-1977.1.patch.txt, HIVE-1977.2.patch.txt


 The syntax for DESCRIBE is broken. It should be:
 {code}
 DESCRIBE [EXTENDED] [database DOT]table [column]
 {code}
 but is actually
 {code}
 DESCRIBE [EXTENDED] table[DOT col_name]
 {code}
 Ref: http://dev.mysql.com/doc/refman/5.0/en/describe.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1977) DESCRIBE TABLE syntax doesn't support specifying a database qualified table name

2012-10-03 Thread Zhenxiao Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo updated HIVE-1977:
---

Status: Patch Available  (was: Open)

 DESCRIBE TABLE syntax doesn't support specifying a database qualified table 
 name
 

 Key: HIVE-1977
 URL: https://issues.apache.org/jira/browse/HIVE-1977
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema, Query Processor, SQL
Reporter: Carl Steinbach
Assignee: Zhenxiao Luo
 Attachments: HIVE-1977.1.patch.txt, HIVE-1977.2.patch.txt


 The syntax for DESCRIBE is broken. It should be:
 {code}
 DESCRIBE [EXTENDED] [database DOT]table [column]
 {code}
 but is actually
 {code}
 DESCRIBE [EXTENDED] table[DOT col_name]
 {code}
 Ref: http://dev.mysql.com/doc/refman/5.0/en/describe.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-3525

2012-10-03 Thread Sean Busbey

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7430/
---

(Updated Oct. 4, 2012, 3:11 a.m.)


Review request for hive.


Changes
---

Now includes a proposed fix, changing internal Hashtable use to HashMap.


Summary (updated)
-

HIVE-3525


Description (updated)
---

Changes Avro SerDe to use HashMap when copying out the Avro MapUtf8, Object 
to MapString, Object. fixes HIVE-3525.


Diffs (updated)
-

  
/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
1393805 
  
/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java
 1393805 
  
/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java
 1393805 
  
/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java
 1393805 

Diff: https://reviews.apache.org/r/7430/diff/


Testing (updated)
---

Includes unit tests for 

* AvroObjectInspectorGenerator to verify that the Nullable value type is 
presented as just the non-null type.
* AvroDeserializer to verify that Maps with null are properly handled
* AvroSerializer to verify that Maps with null can round trip.


Thanks,

Sean Busbey



[jira] [Commented] (HIVE-3525) Avro Maps with Nullable Values fail with NPE

2012-10-03 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469112#comment-13469112
 ] 

Sean Busbey commented on HIVE-3525:
---

[Review Board #7430|https://reviews.apache.org/r/7430/] Now contains a proposed 
fix as well as tests.

 Avro Maps with Nullable Values fail with NPE
 

 Key: HIVE-3525
 URL: https://issues.apache.org/jira/browse/HIVE-3525
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Sean Busbey
 Attachments: HIVE-3525.1.patch.txt


 When working against current trunk@1393794, using a backing Avro schema that 
 has a Map field with nullable values causes a NPE on deserialization when the 
 map contains a null value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3501) Track table and keys used in joins and group bys for logging

2012-10-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469133#comment-13469133
 ] 

Namit Jain commented on HIVE-3501:
--

@Carl, let me know if you are swamped with other issues. I can start the tests 
and commit it if everything goes fine.

 Track table and keys used in joins and group bys for logging
 

 Key: HIVE-3501
 URL: https://issues.apache.org/jira/browse/HIVE-3501
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Sambavi Muthukrishnan
Assignee: Sambavi Muthukrishnan
Priority: Minor
 Attachments: table_access_keys.1.patch, table_access_keys.2.patch, 
 table_access_keys.3.patch, table_access_keys.4.patch, 
 table_access_keys.5.patch

   Original Estimate: 96h
  Remaining Estimate: 96h

 For all operators that could benefit from bucketing, it will be useful to 
 keep track of and log the table names and key column names in order for the 
 operator to be converted to the bucketed version. This task is to track this 
 information for joins and group bys when the keys can be directly mapped back 
 to table scans and columns on that table. This information will be tracked on 
 the QueryPlan object so it is available to any pre/post execution hooks for 
 logging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive

2012-10-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469134#comment-13469134
 ] 

Namit Jain commented on HIVE-3433:
--

The tests finished successfully

 Implement CUBE and ROLLUP operators in Hive
 ---

 Key: HIVE-3433
 URL: https://issues.apache.org/jira/browse/HIVE-3433
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Sambavi Muthukrishnan
Assignee: Namit Jain
 Attachments: hive.3433.1.patch, hive.3433.2.patch, hive.3433.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2874) Renaming external partition changes location

2012-10-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469135#comment-13469135
 ] 

Namit Jain commented on HIVE-2874:
--

You are right. The location should not change by a rename in external's table 
partition.

 Renaming external partition changes location
 

 Key: HIVE-2874
 URL: https://issues.apache.org/jira/browse/HIVE-2874
 Project: Hive
  Issue Type: Bug
Reporter: Kevin Wilfong
Assignee: Zhenxiao Luo
 Attachments: HIVE-2874.1.patch.txt, HIVE-2874.2.patch.txt, 
 HIVE-2874.3.patch.txt


 Renaming an external partition will change the location of that partition to 
 the default location of a managed partition with the same name.
 E.g. If ex_table is external and has partition part=1 with location 
 /.../managed_table/part=1
 Calling ALTER TABLE ex_table PARTITION (part = '1') RENAME TO PARTITION (part 
 = '2');
 Will change the location of the partition to /.../ex_table/part=2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3514:
---

Attachment: HIVE-3514.patch.3

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3514:
-

Status: Open  (was: Patch Available)

comments on phabricator

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1362) column level statistics

2012-10-03 Thread shrikanth shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469143#comment-13469143
 ] 

shrikanth shankar commented on HIVE-1362:
-

I had a couple of high level comments on the patch that seem to fit better here 
rather than on the review board. Apologies if this violates protocol
(1) The count_stats aggregation operator 'repeats' many existing aggregates 
that Hive already supports (count of nulls, count true's, max, min etc). It 
might make a lot more sense to just add an aggregate to return the approximate 
number of distinct values for a column. Any reason why stats collection cant 
just generate more expressions in the SQL?
(2) There might even be value in adding a different UDAF which just returns a 
serialized numDV estimator. Storing this (instead of the count) could be useful 
in other places e.g. combining numDV estimates across partitions (A second UDAF 
would be needed to support aggregating these but that seems easy)

 column level statistics
 ---

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
 HIVE-1362.3.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, 
 HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1362) column level statistics

2012-10-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1362:
-

Status: Open  (was: Patch Available)

Questions on the jira ?

 column level statistics
 ---

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
 HIVE-1362.3.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, 
 HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2874) Renaming external partition changes location

2012-10-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469157#comment-13469157
 ] 

Namit Jain commented on HIVE-2874:
--

+1

 Renaming external partition changes location
 

 Key: HIVE-2874
 URL: https://issues.apache.org/jira/browse/HIVE-2874
 Project: Hive
  Issue Type: Bug
Reporter: Kevin Wilfong
Assignee: Zhenxiao Luo
 Attachments: HIVE-2874.1.patch.txt, HIVE-2874.2.patch.txt, 
 HIVE-2874.3.patch.txt


 Renaming an external partition will change the location of that partition to 
 the default location of a managed partition with the same name.
 E.g. If ex_table is external and has partition part=1 with location 
 /.../managed_table/part=1
 Calling ALTER TABLE ex_table PARTITION (part = '1') RENAME TO PARTITION (part 
 = '2');
 Will change the location of the partition to /.../ex_table/part=2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3514 started by Gang Tim Liu.

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, 
 HIVE-3514.patch.4


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3514:
---

Attachment: HIVE-3514.patch.4

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, 
 HIVE-3514.patch.4


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3514:
---

Status: Patch Available  (was: In Progress)

patch is available on both places.

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, 
 HIVE-3514.patch.4


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3530) warnings in Hive.g

2012-10-03 Thread Namit Jain (JIRA)
Namit Jain created HIVE-3530:


 Summary: warnings in Hive.g
 Key: HIVE-3530
 URL: https://issues.apache.org/jira/browse/HIVE-3530
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain


 [echo] Building Grammar 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g  
 [java] ANTLR Parser Generator  Version 3.0.1 (August 13, 2007)  1989-2007
 [java] warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:578:5:
 Decision can ma\
tch input such as Identifier KW_RENAME KW_TO using multiple alternatives: 1, 
10
 [java] As a result, alternative(s) 10 were disabled for that input
 [java] warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1607:5:
 Decision can m\
atch input such as Identifier DOT Identifier using multiple alternatives: 1, 2
 [java] As a result, alternative(s) 2 were disabled for that input
 [java] warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1823:5:
 Decision can m\
atch input such as KW_ORDER KW_BY LPAREN using multiple alternatives: 1, 2
 [java] As a result, alternative(s) 2 were disabled for that input
 [java] warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1836:5:
 Decision can m\
atch input such as KW_CLUSTER KW_BY LPAREN using multiple alternatives: 1, 2
 [java] As a result, alternative(s) 2 were disabled for that input
 [java] warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1848:5:
 Decision can m\
atch input such as KW_DISTRIBUTE KW_BY LPAREN using multiple alternatives: 1, 
2
 [java] As a result, alternative(s) 2 were disabled for that input
 [java] warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1859:5:
 Decision can m\
atch input such as KW_SORT KW_BY LPAREN using multiple alternatives: 1, 2
 [java] As a result, alternative(s) 2 were disabled for that input


Most of these seem to be due to HIVE-1367


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3530) warnings in Hive.g

2012-10-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3530:
-

Description: 
 Building Grammar 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g  
 ANTLR Parser Generator  Version 3.0.1 (August 13, 2007)  1989-2007
 warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:578:5:
 Decision can match input such as Identifier KW_RENAME KW_TO using multiple 
alternatives: 1, 10
 As a result, alternative(s) 10 were disabled for that input
 warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1607:5:
 Decision can m
atch input such as Identifier DOT Identifier using multiple alternatives: 1, 2
 As a result, alternative(s) 2 were disabled for that input
 warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1823:5:
 Decision can match input such as KW_ORDER KW_BY LPAREN using multiple 
alternatives: 1, 2
 As a result, alternative(s) 2 were disabled for that input
 warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1836:5:
 Decision can match input such as KW_CLUSTER KW_BY LPAREN using multiple 
alternatives: 1, 2
 As a result, alternative(s) 2 were disabled for that input
 warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1848:5:
 Decision can match input such as KW_DISTRIBUTE KW_BY LPAREN using multiple 
alternatives: 1, 2
 As a result, alternative(s) 2 were disabled for that input
 warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1859:5:
 Decision can match input such as KW_SORT KW_BY LPAREN using multiple 
alternatives: 1, 2
 As a result, alternative(s) 2 were disabled for that input


Most of these seem to be due to HIVE-1367


  was:
 [echo] Building Grammar 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g  
 [java] ANTLR Parser Generator  Version 3.0.1 (August 13, 2007)  1989-2007
 [java] warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:578:5:
 Decision can ma\
tch input such as Identifier KW_RENAME KW_TO using multiple alternatives: 1, 
10
 [java] As a result, alternative(s) 10 were disabled for that input
 [java] warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1607:5:
 Decision can m\
atch input such as Identifier DOT Identifier using multiple alternatives: 1, 2
 [java] As a result, alternative(s) 2 were disabled for that input
 [java] warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1823:5:
 Decision can m\
atch input such as KW_ORDER KW_BY LPAREN using multiple alternatives: 1, 2
 [java] As a result, alternative(s) 2 were disabled for that input
 [java] warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1836:5:
 Decision can m\
atch input such as KW_CLUSTER KW_BY LPAREN using multiple alternatives: 1, 2
 [java] As a result, alternative(s) 2 were disabled for that input
 [java] warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1848:5:
 Decision can m\
atch input such as KW_DISTRIBUTE KW_BY LPAREN using multiple alternatives: 1, 
2
 [java] As a result, alternative(s) 2 were disabled for that input
 [java] warning(200): 
/Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1859:5:
 Decision can m\
atch input such as KW_SORT KW_BY LPAREN using multiple alternatives: 1, 2
 [java] As a result, alternative(s) 2 were disabled for that input


Most of these seem to be due to HIVE-1367



 warnings in Hive.g
 --

 Key: HIVE-3530
 URL: https://issues.apache.org/jira/browse/HIVE-3530
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain

  Building Grammar 
 /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g  
 
  ANTLR Parser Generator  Version 3.0.1 (August 13, 2007)  1989-2007
  warning(200): 
 /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:578:5:
  Decision can match input such as Identifier KW_RENAME KW_TO using multiple 
 alternatives: 1, 10
  As a result, alternative(s) 10 were disabled for that input
  warning(200): 
 /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1607:5:
  Decision can m
 atch input such as Identifier DOT Identifier using multiple alternatives: 
 1, 2
  As a result, alternative(s) 2 were disabled for that input
  warning(200): 
 /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1823:5:
  Decision 

[jira] [Commented] (HIVE-3530) warnings in Hive.g

2012-10-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469160#comment-13469160
 ] 

Namit Jain commented on HIVE-3530:
--

[~zhenxiao], can you take a look if possible ?

 warnings in Hive.g
 --

 Key: HIVE-3530
 URL: https://issues.apache.org/jira/browse/HIVE-3530
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain

  Building Grammar 
 /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g  
 
  ANTLR Parser Generator  Version 3.0.1 (August 13, 2007)  1989-2007
  warning(200): 
 /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:578:5:
  Decision can match input such as Identifier KW_RENAME KW_TO using multiple 
 alternatives: 1, 10
  As a result, alternative(s) 10 were disabled for that input
  warning(200): 
 /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1607:5:
  Decision can m
 atch input such as Identifier DOT Identifier using multiple alternatives: 
 1, 2
  As a result, alternative(s) 2 were disabled for that input
  warning(200): 
 /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1823:5:
  Decision can match input such as KW_ORDER KW_BY LPAREN using multiple 
 alternatives: 1, 2
  As a result, alternative(s) 2 were disabled for that input
  warning(200): 
 /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1836:5:
  Decision can match input such as KW_CLUSTER KW_BY LPAREN using multiple 
 alternatives: 1, 2
  As a result, alternative(s) 2 were disabled for that input
  warning(200): 
 /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1848:5:
  Decision can match input such as KW_DISTRIBUTE KW_BY LPAREN using multiple 
 alternatives: 1, 2
  As a result, alternative(s) 2 were disabled for that input
  warning(200): 
 /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1859:5:
  Decision can match input such as KW_SORT KW_BY LPAREN using multiple 
 alternatives: 1, 2
  As a result, alternative(s) 2 were disabled for that input
 Most of these seem to be due to HIVE-1367

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-3514:


Assignee: Gang Tim Liu

more comments

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, 
 HIVE-3514.patch.4


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3514:
-

Assignee: (was: Gang Tim Liu)
  Status: Open  (was: Patch Available)

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, 
 HIVE-3514.patch.4


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-1362: Support for column statistics in Hive

2012-10-03 Thread Shreepadma Venugopalan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6878/
---

(Updated Oct. 4, 2012, 5:45 a.m.)


Review request for hive and Carl Steinbach.


Changes
---

Previous version of the patch didn't render correctly. This version fixes that 
problem. Sorry abt the earlier version.


Description
---

This patch implements version 1 of the column statistics project in Hive. It 
adds support for computing and persisting statistical summary of column values 
in Hive Tables and Partitions. In order to support column statistics in Hive, 
this patch does the following,

* Adds a new compute stats UDAF to compute scalar statistics for all primitive 
Hive data types. In version 1 of the project, we support the following scalar 
statistics on primitive types - estimate of number of distinct values, number 
of null values, number of trues/falses for boolean typed columsn, max and avg 
length for string and binary typed columns, max and min value for long and 
double typed columns. Note that version 1 of the column stats project includes 
support for column statistics both at the table and partition level.

* Adds Metastore schema tables to persist the newly added statistics both at 
table and partition level.
* Adds Metastore Thrift API to persist, retrieve and delete column statistics 
at both table and partition level. 
Please refer to the following wiki link for the details of the schema and the 
Thrift API changes - 
https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive

* Extends the analyze table compute statistics statement to trigger statistics 
computation and persistence for one or more columns. Please note that 
statistics for multiple columns is computed through a single scan of the table 
data. Please refer to the following wiki link for the syntax changes - 
https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive

One thing missing from the patch at this point is the metastore upgrade scrips 
for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to finalize the 
metastore schema changes before I go ahead and add the upgrade scripts.

In a follow on patch, as part of version 2 of the column statistics project, we 
will add support for computing, persisting and retrieving histograms on long 
and double typed column values.

Generated Thrift files have been removed for viewing pleasure. JIRA page has 
the patch with the generated Thrift files.


This addresses bug HIVE-1362.
https://issues.apache.org/jira/browse/HIVE-1362


Diffs (updated)
-

  data/files/UserVisits.dat PRE-CREATION 
  data/files/binary.txt PRE-CREATION 
  data/files/bool.txt PRE-CREATION 
  data/files/double.txt PRE-CREATION 
  data/files/employee.dat PRE-CREATION 
  data/files/employee2.dat PRE-CREATION 
  data/files/int.txt PRE-CREATION 
  ivy/libraries.properties 7ac6778 
  metastore/if/hive_metastore.thrift d4fad72 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
8fec13d 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
17b986c 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
3883b5b 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java eff44b1 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bf5ae3a 
  metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 77d1caa 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
 PRE-CREATION 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java
 PRE-CREATION 
  metastore/src/model/package.jdo 38ce6d5 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 528a100 
  metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 
925938d 
  ql/build.xml 5de3f78 
  ql/if/queryplan.thrift 05fbf58 
  ql/ivy.xml aa3b8ce 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 425900d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 4446952 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 79b87f1 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 7440889 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 0b55ac4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 344dc69 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java f7257cd 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java 
e75a075 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java 
61bc7fd 
  ql/src/java/org/apache/hadoop/hive/ql/parse/FunctionSemanticAnalyzer.java 
6024dd4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 

[jira] [Updated] (HIVE-1362) column level statistics

2012-10-03 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-1362:
-

Attachment: HIVE-1362.4.patch.txt

 column level statistics
 ---

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
 HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, 
 HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, 
 HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1362) column level statistics

2012-10-03 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-1362:
-

Attachment: HIVE-1362-gen_thrift.4.patch.txt

 column level statistics
 ---

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, 
 HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, 
 HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, 
 HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


RE: Hive Connection Error

2012-10-03 Thread deepak.talim
Hi,

How to un-subscribe.


Thanks  Regards,
 
Deepak Talim
Architect | Analytics  Information Management | Wipro Technologies | Pune
Phone - VOIP: 8547081 | D: +91 +20 +39132608 | M: +91 98816 90900

-Original Message-
From: deepak.ta...@wipro.com [mailto:deepak.ta...@wipro.com]
Sent: Tuesday, July 03, 2012 5:44 PM
To: dev@hive.apache.org
Subject: Hive Connection Error

Hi,

While trying to connect to Hive using talend 5 'HiveConnection' getting 
following error:

Technical details:
Coludera ver CDH3
Apache ver 1
Hive ver 7.0
HDFS

Error:
While connecting it's trying to create directory with the 'windows AD user id' 
on hdfs tmp directory, what permissions are required to provide or what is the 
solution.

Error details:
===
[statistics] connecting to socket on port 3338 [statistics] connected
12/07/03 16:49:53 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
12/07/03 16:49:54 INFO metastore.HiveMetaStore: 0: Opening raw store with 
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
12/07/03 16:49:54 INFO metastore.ObjectStore: ObjectStore, initialize called
12/07/03 16:49:54 INFO DataNucleus.Persistence: Property 
datanucleus.cache.level2 unknown - will be ignored
12/07/03 16:49:54 INFO DataNucleus.Persistence: Property 
javax.jdo.option.NonTransactionalRead unknown - will be ignored
12/07/03 16:49:54 INFO DataNucleus.Persistence: = Persistence 
Configuration ===
12/07/03 16:49:54 INFO DataNucleus.Persistence: DataNucleus Persistence Factory 
- Vendor: DataNucleus  Version: 2.0.3
12/07/03 16:49:54 INFO DataNucleus.Persistence: DataNucleus Persistence Factory 
initialised for datastore 
URL=jdbc:derby:;databaseName=metastore_db;create=true 
driver=org.apache.derby.jdbc.EmbeddedDriver userName=APP
12/07/03 16:49:54 INFO DataNucleus.Persistence: 
===
12/07/03 16:49:57 INFO Datastore.Schema: Initialising Catalog , Schema APP 
using None auto-start option
12/07/03 16:49:57 INFO Datastore.Schema: Catalog , Schema APP initialised - 
managing 0 classes
12/07/03 16:49:57 INFO metastore.ObjectStore: Setting MetaStore object pin 
classes with 
hive.metastore.cache.pinobjtypes=Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order
12/07/03 16:49:57 INFO DataNucleus.MetaData: Registering listener for metadata 
initialisation
12/07/03 16:49:57 INFO metastore.ObjectStore: Initialized ObjectStore
12/07/03 16:49:58 WARN DataNucleus.MetaData: MetaData Parser encountered an 
error in file 
jar:file:/D:/Deepak%20Talim/Technical/Talend5/Talend%20Project/.Java/lib/hive-metastore-0.8.1.jar!/package.jdo
 at line 11, column 6 : cvc-elt.1: Cannot find the declaration of element 
'jdo'. - Please check your specification of DTD and the validity of the 
MetaData XML that you have specified.
12/07/03 16:49:58 WARN DataNucleus.MetaData: MetaData Parser encountered an 
error in file 
jar:file:/D:/Deepak%20Talim/Technical/Talend5/Talend%20Project/.Java/lib/hive-metastore-0.8.1.jar!/package.jdo
 at line 321, column 13 : The content of element type class must match 
(extension*,implements*,datastore-identity?,primary-key?,inheritance?,version?,join*,foreign-key*,index*,unique*,column*,field*,property*,query*,fetch-group*,extension*).
 - Please check your specification of DTD and the validity of the MetaData XML 
that you have specified.
12/07/03 16:49:58 WARN DataNucleus.MetaData: MetaData Parser encountered an 
error in file 
jar:file:/D:/Deepak%20Talim/Technical/Talend5/Talend%20Project/.Java/lib/hive-metastore-0.8.1.jar!/package.jdo
 at line 368, column 13 : The content of element type class must match 
(extension*,implements*,datastore-identity?,primary-key?,inheritance?,version?,join*,foreign-key*,index*,unique*,column*,field*,property*,query*,fetch-group*,extension*).
 - Please check your specification of DTD and the validity of the MetaData XML 
that you have specified.
12/07/03 16:49:59 WARN DataNucleus.MetaData: MetaData Parser encountered an 
error in file 
jar:file:/D:/Deepak%20Talim/Technical/Talend5/Talend%20Project/.Java/lib/hive-metastore-0.8.1.jar!/package.jdo
 at line 390, column 13 : The content of element type class must match 
(extension*,implements*,datastore-identity?,primary-key?,inheritance?,version?,join*,foreign-key*,index*,unique*,column*,field*,property*,query*,fetch-group*,extension*).
 - Please check your specification of DTD and the validity of the MetaData XML 
that you have specified.
12/07/03 16:49:59 WARN DataNucleus.MetaData: MetaData Parser encountered an 
error in file 
jar:file:/D:/Deepak%20Talim/Technical/Talend5/Talend%20Project/.Java/lib/hive-metastore-0.8.1.jar!/package.jdo
 at line 425, column 13 : The content of element type class must match 
(extension*,implements*,datastore-identity?,primary-key?,inheritance?,version?,join*,foreign-key*,index*,unique*,column*,field*,property*,query*,fetch-group*,extension*).

[jira] [Work started] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3514 started by Gang Tim Liu.

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, 
 HIVE-3514.patch.4, HIVE-3514.patch.5


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3514:
---

Status: Patch Available  (was: In Progress)

patch is available in both places.

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, 
 HIVE-3514.patch.4, HIVE-3514.patch.5


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.

2012-10-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3514:
---

Attachment: HIVE-3514.patch.5

 Refactor Partition Pruner so that logic can be reused.
 --

 Key: HIVE-3514
 URL: https://issues.apache.org/jira/browse/HIVE-3514
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, 
 HIVE-3514.patch.4, HIVE-3514.patch.5


 Partition Pruner has logic reusable like
 1. walk through operator tree
 2. walk through operation tree
 3. create pruning predicate
 The first candidate is list bucketing pruner.
 Some consideration:
 1. refactor for general use case not just list bucketing
 2. avoid over-refactor by focusing on pieces targeted for reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira