[jira] [Updated] (HIVE-3276) optimize union sub-queries
[ https://issues.apache.org/jira/browse/HIVE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3276: - Attachment: hive.3276.10.patch optimize union sub-queries -- Key: HIVE-3276 URL: https://issues.apache.org/jira/browse/HIVE-3276 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3276.10.patch, HIVE-3276.1.patch, hive.3276.2.patch, hive.3276.3.patch, hive.3276.4.patch, hive.3276.5.patch, hive.3276.6.patch, hive.3276.7.patch, hive.3276.8.patch, hive.3276.9.patch It might be a good idea to optimize simple union queries containing map-reduce jobs in at least one of the sub-qeuries. For eg: a query like: insert overwrite table T1 partition P1 select * from ( subq1 union all subq2 ) u; today creates 3 map-reduce jobs, one for subq1, another for subq2 and the final one for the union. It might be a good idea to optimize this. Instead of creating the union task, it might be simpler to create a move task (or something like a move task), where the outputs of the two sub-queries will be moved to the final directory. This can easily extend to more than 2 sub-queries in the union. This is very useful if there is a select * followed by filesink after the union. This can be independently useful, and also be used to optimize the skewed joins https://cwiki.apache.org/Hive/skewed-join-optimization.html. If there is a select, filter between the union and the filesink, the select and the filter can be moved before the union, and the follow-up job can still be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3276) optimize union sub-queries
[ https://issues.apache.org/jira/browse/HIVE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3276: - Status: Patch Available (was: Open) comments addressed optimize union sub-queries -- Key: HIVE-3276 URL: https://issues.apache.org/jira/browse/HIVE-3276 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3276.10.patch, HIVE-3276.1.patch, hive.3276.2.patch, hive.3276.3.patch, hive.3276.4.patch, hive.3276.5.patch, hive.3276.6.patch, hive.3276.7.patch, hive.3276.8.patch, hive.3276.9.patch It might be a good idea to optimize simple union queries containing map-reduce jobs in at least one of the sub-qeuries. For eg: a query like: insert overwrite table T1 partition P1 select * from ( subq1 union all subq2 ) u; today creates 3 map-reduce jobs, one for subq1, another for subq2 and the final one for the union. It might be a good idea to optimize this. Instead of creating the union task, it might be simpler to create a move task (or something like a move task), where the outputs of the two sub-queries will be moved to the final directory. This can easily extend to more than 2 sub-queries in the union. This is very useful if there is a select * followed by filesink after the union. This can be independently useful, and also be used to optimize the skewed joins https://cwiki.apache.org/Hive/skewed-join-optimization.html. If there is a select, filter between the union and the filesink, the select and the filter can be moved before the union, and the follow-up job can still be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive
[ https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3433: - Status: Patch Available (was: Open) addressed comments Implement CUBE and ROLLUP operators in Hive --- Key: HIVE-3433 URL: https://issues.apache.org/jira/browse/HIVE-3433 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Sambavi Muthukrishnan Assignee: Namit Jain Attachments: hive.3433.1.patch, hive.3433.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive
[ https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3433: - Attachment: hive.3433.2.patch Implement CUBE and ROLLUP operators in Hive --- Key: HIVE-3433 URL: https://issues.apache.org/jira/browse/HIVE-3433 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Sambavi Muthukrishnan Assignee: Namit Jain Attachments: hive.3433.1.patch, hive.3433.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468475#comment-13468475 ] Namit Jain commented on HIVE-3514: -- comments Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive
[ https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468492#comment-13468492 ] Namit Jain commented on HIVE-3433: -- Shreepadma, I saw your ivy.xml changes in https://reviews.apache.org/r/6878/diff/?page=1. I can do the conversion of bitset to fastbitset once your jira is in. Implement CUBE and ROLLUP operators in Hive --- Key: HIVE-3433 URL: https://issues.apache.org/jira/browse/HIVE-3433 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Sambavi Muthukrishnan Assignee: Namit Jain Attachments: hive.3433.1.patch, hive.3433.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-1362: Support for column statistics in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6878/#review12131 --- Some questions: How much this interact with hive.stats.reliable ? There are many places with a TODO The formatting needs to be fixed in many places. Some functions are returning hashset etc. - they should be changed to return a set instead ? Can you make sure you use complete variable names - metastore/if/hive_metastore.thrift https://reviews.apache.org/r/6878/#comment25818 Does it make sense to add a thrift API for updating statistics ? There doesn't exist a interface for updating row level statistics. ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java https://reviews.apache.org/r/6878/#comment25819 can you use full variable name instead of Rwt ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java https://reviews.apache.org/r/6878/#comment25820 LHS should not be an arraylist Please fix all such occurences - namit jain On Oct. 3, 2012, 3:10 a.m., Shreepadma Venugopalan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6878/ --- (Updated Oct. 3, 2012, 3:10 a.m.) Review request for hive and Carl Steinbach. Description --- This patch implements version 1 of the column statistics project in Hive. It adds support for computing and persisting statistical summary of column values in Hive Tables and Partitions. In order to support column statistics in Hive, this patch does the following, * Adds a new compute stats UDAF to compute scalar statistics for all primitive Hive data types. In version 1 of the project, we support the following scalar statistics on primitive types - estimate of number of distinct values, number of null values, number of trues/falses for boolean typed columsn, max and avg length for string and binary typed columns, max and min value for long and double typed columns. Note that version 1 of the column stats project includes support for column statistics both at the table and partition level. * Adds Metastore schema tables to persist the newly added statistics both at table and partition level. * Adds Metastore Thrift API to persist, retrieve and delete column statistics at both table and partition level. Please refer to the following wiki link for the details of the schema and the Thrift API changes - https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive * Extends the analyze table compute statistics statement to trigger statistics computation and persistence for one or more columns. Please note that statistics for multiple columns is computed through a single scan of the table data. Please refer to the following wiki link for the syntax changes - https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive One thing missing from the patch at this point is the metastore upgrade scrips for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to finalize the metastore schema changes before I go ahead and add the upgrade scripts. In a follow on patch, as part of version 2 of the column statistics project, we will add support for computing, persisting and retrieving histograms on long and double typed column values. Generated Thrift files have been removed for viewing pleasure. JIRA page has the patch with the generated Thrift files. This addresses bug HIVE-1362. https://issues.apache.org/jira/browse/HIVE-1362 Diffs - data/files/UserVisits.dat PRE-CREATION data/files/binary.txt PRE-CREATION data/files/bool.txt PRE-CREATION data/files/double.txt PRE-CREATION data/files/employee.dat PRE-CREATION data/files/employee2.dat PRE-CREATION data/files/int.txt PRE-CREATION ivy/libraries.properties 7ac6778 metastore/if/hive_metastore.thrift d4fad72 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 8fec13d metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 17b986c metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 3883b5b metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java eff44b1 metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bf5ae3a metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 77d1caa metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java PRE-CREATION metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java PRE-CREATION metastore/src/model/package.jdo 38ce6d5
[jira] [Commented] (HIVE-1362) column level statistics
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468502#comment-13468502 ] Namit Jain commented on HIVE-1362: -- Are the stats collected while the table is being scanned, or is it part of analyze only ? column level statistics --- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1362) column level statistics
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1362: - Status: Open (was: Patch Available) column level statistics --- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Insert into vs Insert overwrite
Hi all, I would like to know the difference between Hive insert into and insert overwrite for a Hive external table. Thanks, Kasun.
Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #157
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/157/ -- [...truncated 5071 lines...] A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/QueryPlan.java A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Adjacency.java A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Graph.java A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Task.java A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/AdjacencyType.java A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Stage.java A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/TaskType.java A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Query.java A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/StageType.java A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/NodeType.java A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/Operator.java A ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java A ql/src/gen/thrift/gen-php A ql/src/gen/thrift/gen-php/queryplan A ql/src/gen/thrift/gen-php/queryplan/queryplan_types.php A ql/src/gen-javabean A ql/src/gen-javabean/org A ql/src/gen-javabean/org/apache A ql/src/gen-javabean/org/apache/hadoop A ql/src/gen-javabean/org/apache/hadoop/hive A ql/src/gen-javabean/org/apache/hadoop/hive/ql A ql/src/gen-javabean/org/apache/hadoop/hive/ql/plan A ql/src/gen-javabean/org/apache/hadoop/hive/ql/plan/api A ql/src/gen-php A ql/build.xml A ql/if A ql/if/queryplan.thrift A pdk A pdk/ivy.xml A pdk/scripts A pdk/scripts/class-registration.xsl A pdk/scripts/build-plugin.xml A pdk/scripts/README A pdk/src A pdk/src/java A pdk/src/java/org A pdk/src/java/org/apache A pdk/src/java/org/apache/hive A pdk/src/java/org/apache/hive/pdk A pdk/src/java/org/apache/hive/pdk/FunctionExtractor.java A pdk/src/java/org/apache/hive/pdk/HivePdkUnitTest.java A pdk/src/java/org/apache/hive/pdk/HivePdkUnitTests.java A pdk/src/java/org/apache/hive/pdk/PluginTest.java A pdk/test-plugin A pdk/test-plugin/test A pdk/test-plugin/test/cleanup.sql A pdk/test-plugin/test/onerow.txt A pdk/test-plugin/test/setup.sql A pdk/test-plugin/src A pdk/test-plugin/src/org A pdk/test-plugin/src/org/apache A pdk/test-plugin/src/org/apache/hive A pdk/test-plugin/src/org/apache/hive/pdktest A pdk/test-plugin/src/org/apache/hive/pdktest/Rot13.java A pdk/test-plugin/build.xml A pdk/build.xml A build-offline.xml U. At revision 1393573 no change for http://svn.apache.org/repos/asf/hive/branches/branch-0.9 since the previous build [hive] $ /home/hudson/tools/ant/apache-ant-1.8.1/bin/ant -Dversion=0.9.1-SNAPSHOT very-clean tar binary Buildfile: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build.xml ivy-init-dirs: [echo] Project: hive [mkdir] Created dir: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy [mkdir] Created dir: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/lib [mkdir] Created dir: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/report [mkdir] Created dir: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/maven ivy-download: [echo] Project: hive [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/lib/ivy-2.1.0.jar ivy-probe-antlib: [echo] Project: hive ivy-init-antlib: [echo] Project: hive ivy-clean-cache: [ivy:cleancache] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ :: [ivy:cleancache] :: loading settings :: url = jar:file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/lib/ivy-2.1.0.jar!/org/apache/ivy/core/settings/ivysettings.xml clean: [echo] Project: hive clean: [echo] Project: anttasks clean: [echo] Project: shims clean: [echo] Project: common clean: [echo] Project: serde clean: [echo] Project: metastore
Re: Review Request: HIVE-1362: Support for column statistics in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6878/#review12133 --- ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java https://reviews.apache.org/r/6878/#comment25822 I'll replace LHS with generic java types. - Shreepadma Venugopalan On Oct. 3, 2012, 3:10 a.m., Shreepadma Venugopalan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6878/ --- (Updated Oct. 3, 2012, 3:10 a.m.) Review request for hive and Carl Steinbach. Description --- This patch implements version 1 of the column statistics project in Hive. It adds support for computing and persisting statistical summary of column values in Hive Tables and Partitions. In order to support column statistics in Hive, this patch does the following, * Adds a new compute stats UDAF to compute scalar statistics for all primitive Hive data types. In version 1 of the project, we support the following scalar statistics on primitive types - estimate of number of distinct values, number of null values, number of trues/falses for boolean typed columsn, max and avg length for string and binary typed columns, max and min value for long and double typed columns. Note that version 1 of the column stats project includes support for column statistics both at the table and partition level. * Adds Metastore schema tables to persist the newly added statistics both at table and partition level. * Adds Metastore Thrift API to persist, retrieve and delete column statistics at both table and partition level. Please refer to the following wiki link for the details of the schema and the Thrift API changes - https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive * Extends the analyze table compute statistics statement to trigger statistics computation and persistence for one or more columns. Please note that statistics for multiple columns is computed through a single scan of the table data. Please refer to the following wiki link for the syntax changes - https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive One thing missing from the patch at this point is the metastore upgrade scrips for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to finalize the metastore schema changes before I go ahead and add the upgrade scripts. In a follow on patch, as part of version 2 of the column statistics project, we will add support for computing, persisting and retrieving histograms on long and double typed column values. Generated Thrift files have been removed for viewing pleasure. JIRA page has the patch with the generated Thrift files. This addresses bug HIVE-1362. https://issues.apache.org/jira/browse/HIVE-1362 Diffs - data/files/UserVisits.dat PRE-CREATION data/files/binary.txt PRE-CREATION data/files/bool.txt PRE-CREATION data/files/double.txt PRE-CREATION data/files/employee.dat PRE-CREATION data/files/employee2.dat PRE-CREATION data/files/int.txt PRE-CREATION ivy/libraries.properties 7ac6778 metastore/if/hive_metastore.thrift d4fad72 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 8fec13d metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 17b986c metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 3883b5b metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java eff44b1 metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bf5ae3a metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 77d1caa metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java PRE-CREATION metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java PRE-CREATION metastore/src/model/package.jdo 38ce6d5 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 528a100 metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 925938d ql/build.xml 5de3f78 ql/if/queryplan.thrift 05fbf58 ql/ivy.xml aa3b8ce ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 425900d ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 4c8831f ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 4446952 ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 79b87f1 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 7440889
[jira] [Commented] (HIVE-3501) Track table and keys used in joins and group bys for logging
[ https://issues.apache.org/jira/browse/HIVE-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468662#comment-13468662 ] Sambavi Muthukrishnan commented on HIVE-3501: - Thanks Carl. Please let me know if you hit any issues with the tests. Track table and keys used in joins and group bys for logging Key: HIVE-3501 URL: https://issues.apache.org/jira/browse/HIVE-3501 Project: Hive Issue Type: Task Components: Query Processor Affects Versions: 0.10.0 Reporter: Sambavi Muthukrishnan Assignee: Sambavi Muthukrishnan Priority: Minor Attachments: table_access_keys.1.patch, table_access_keys.2.patch, table_access_keys.3.patch, table_access_keys.4.patch, table_access_keys.5.patch Original Estimate: 96h Remaining Estimate: 96h For all operators that could benefit from bucketing, it will be useful to keep track of and log the table names and key column names in order for the operator to be converted to the bucketed version. This task is to track this information for joins and group bys when the keys can be directly mapped back to table scans and columns on that table. This information will be tracked on the QueryPlan object so it is available to any pre/post execution hooks for logging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1362) column level statistics
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468668#comment-13468668 ] Shreepadma Venugopalan commented on HIVE-1362: -- @Namit: The stats is collected as part of analyze. We will look into implicit stats collection i.e., when the table is scanned/loaded in next version of this project. column level statistics --- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-1362: Support for column statistics in Hive
On Oct. 3, 2012, 11:50 a.m., namit jain wrote: Some questions: How much this interact with hive.stats.reliable ? There are many places with a TODO The formatting needs to be fixed in many places. Some functions are returning hashset etc. - they should be changed to return a set instead ? Can you make sure you use complete variable names - This patch doesn't interact in any way with hive.stats.reliable. Will rename any variables with shortened name to use the full name. Will return generic java type instead of ArrayList, HashSet etc. Will fix formatting. There is only one place with a real TODO - the implementation of Flajolet-Martin Sketch. I was planning to fix the TODO by making the accuracy percentage a configurable parameter. The other places with a TODO are auto generated code which says TODO - Auto Generated method. On Oct. 3, 2012, 11:50 a.m., namit jain wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java, line 52 https://reviews.apache.org/r/6878/diff/3/?file=173533#file173533line52 can you use full variable name instead of Rwt Will do. On Oct. 3, 2012, 11:50 a.m., namit jain wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java, line 161 https://reviews.apache.org/r/6878/diff/3/?file=173542#file173542line161 LHS should not be an arraylist Please fix all such occurences Will change any LHS occurrences to use Java generic type. - Shreepadma --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6878/#review12131 --- On Oct. 3, 2012, 3:10 a.m., Shreepadma Venugopalan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6878/ --- (Updated Oct. 3, 2012, 3:10 a.m.) Review request for hive and Carl Steinbach. Description --- This patch implements version 1 of the column statistics project in Hive. It adds support for computing and persisting statistical summary of column values in Hive Tables and Partitions. In order to support column statistics in Hive, this patch does the following, * Adds a new compute stats UDAF to compute scalar statistics for all primitive Hive data types. In version 1 of the project, we support the following scalar statistics on primitive types - estimate of number of distinct values, number of null values, number of trues/falses for boolean typed columsn, max and avg length for string and binary typed columns, max and min value for long and double typed columns. Note that version 1 of the column stats project includes support for column statistics both at the table and partition level. * Adds Metastore schema tables to persist the newly added statistics both at table and partition level. * Adds Metastore Thrift API to persist, retrieve and delete column statistics at both table and partition level. Please refer to the following wiki link for the details of the schema and the Thrift API changes - https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive * Extends the analyze table compute statistics statement to trigger statistics computation and persistence for one or more columns. Please note that statistics for multiple columns is computed through a single scan of the table data. Please refer to the following wiki link for the syntax changes - https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive One thing missing from the patch at this point is the metastore upgrade scrips for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to finalize the metastore schema changes before I go ahead and add the upgrade scripts. In a follow on patch, as part of version 2 of the column statistics project, we will add support for computing, persisting and retrieving histograms on long and double typed column values. Generated Thrift files have been removed for viewing pleasure. JIRA page has the patch with the generated Thrift files. This addresses bug HIVE-1362. https://issues.apache.org/jira/browse/HIVE-1362 Diffs - data/files/UserVisits.dat PRE-CREATION data/files/binary.txt PRE-CREATION data/files/bool.txt PRE-CREATION data/files/double.txt PRE-CREATION data/files/employee.dat PRE-CREATION data/files/employee2.dat PRE-CREATION data/files/int.txt PRE-CREATION ivy/libraries.properties 7ac6778 metastore/if/hive_metastore.thrift d4fad72 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 8fec13d metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 17b986c
[jira] [Commented] (HIVE-3276) optimize union sub-queries
[ https://issues.apache.org/jira/browse/HIVE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468682#comment-13468682 ] Namit Jain commented on HIVE-3276: -- The tests finished fine optimize union sub-queries -- Key: HIVE-3276 URL: https://issues.apache.org/jira/browse/HIVE-3276 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3276.10.patch, HIVE-3276.1.patch, hive.3276.2.patch, hive.3276.3.patch, hive.3276.4.patch, hive.3276.5.patch, hive.3276.6.patch, hive.3276.7.patch, hive.3276.8.patch, hive.3276.9.patch It might be a good idea to optimize simple union queries containing map-reduce jobs in at least one of the sub-qeuries. For eg: a query like: insert overwrite table T1 partition P1 select * from ( subq1 union all subq2 ) u; today creates 3 map-reduce jobs, one for subq1, another for subq2 and the final one for the union. It might be a good idea to optimize this. Instead of creating the union task, it might be simpler to create a move task (or something like a move task), where the outputs of the two sub-queries will be moved to the final directory. This can easily extend to more than 2 sub-queries in the union. This is very useful if there is a select * followed by filesink after the union. This can be independently useful, and also be used to optimize the skewed joins https://cwiki.apache.org/Hive/skewed-join-optimization.html. If there is a select, filter between the union and the filesink, the select and the filter can be moved before the union, and the follow-up job can still be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive
[ https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3433: - Attachment: hive.3433.3.patch Implement CUBE and ROLLUP operators in Hive --- Key: HIVE-3433 URL: https://issues.apache.org/jira/browse/HIVE-3433 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Sambavi Muthukrishnan Assignee: Namit Jain Attachments: hive.3433.1.patch, hive.3433.2.patch, hive.3433.3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive
[ https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468684#comment-13468684 ] Namit Jain commented on HIVE-3433: -- [~shreepadma], thanks - using fastbitset instead. Implement CUBE and ROLLUP operators in Hive --- Key: HIVE-3433 URL: https://issues.apache.org/jira/browse/HIVE-3433 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Sambavi Muthukrishnan Assignee: Namit Jain Attachments: hive.3433.1.patch, hive.3433.2.patch, hive.3433.3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive
[ https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468686#comment-13468686 ] Shreepadma Venugopalan commented on HIVE-3433: -- Thanks for making the change, Namit. Implement CUBE and ROLLUP operators in Hive --- Key: HIVE-3433 URL: https://issues.apache.org/jira/browse/HIVE-3433 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Sambavi Muthukrishnan Assignee: Namit Jain Attachments: hive.3433.1.patch, hive.3433.2.patch, hive.3433.3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-3514 started by Gang Tim Liu. Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch, HIVE-3514.patch.2 Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3514: --- Attachment: HIVE-3514.patch.2 Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch, HIVE-3514.patch.2 Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3514: --- Status: Patch Available (was: In Progress) Patch is available on both attachment and D5727. thanks Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch, HIVE-3514.patch.2 Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3498) hivetest.py fails with --revision option
[ https://issues.apache.org/jira/browse/HIVE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-3498: Resolution: Fixed Fix Version/s: 0.10.0 Status: Resolved (was: Patch Available) Committed, thanks Ivan. hivetest.py fails with --revision option Key: HIVE-3498 URL: https://issues.apache.org/jira/browse/HIVE-3498 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Ivan Gorbachev Assignee: Ivan Gorbachev Labels: testing Fix For: 0.10.0 Attachments: jira-3498.0.patch How to reproduce outside hivetest.py: 1. Clone git://git.apache.org/hive.git 2. Run ant arc-setup 3. Run arc patch rev Output: {quote} This diff is against commit https://svn.apache.org/repos/asf/hive/trunk@1382631, but the commit is nowhere in the working copy. Try to apply it against the current working copy state? (d5f66df1edfff2645f225298e225dbccc70d97ff) [Y/n] {quote} If you choose 'Y' it suggests you to complete 'merge-message' and then prints: {quote} Select a Default Commit Range You're running a command which operates on a range of revisions (usually, from some revision to HEAD) but have not specified the revision that should determine the start of the range. Previously, arc assumed you meant 'HEAD^' when you did not specify a start revision, but this behavior does not make much sense in most workflows outside of Facebook's historic git-svn workflow. arc no longer assumes 'HEAD^'. You must specify a relative commit explicitly when you invoke a command (e.g., `arc diff HEAD^`, not just `arc diff`) or select a default for this working copy. In most cases, the best default is 'origin/master'. You can also select 'HEAD^' to preserve the old behavior, or some other remote or branch. But you almost certainly want to select 'origin/master'. (Technically: the merge-base of the selected revision and HEAD is used to determine the start of the commit range.) What default do you want to use? [origin/master] {quote} There isn't the same behavior for svn checkout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-1362: Support for column statistics in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6878/ --- (Updated Oct. 3, 2012, 7:16 p.m.) Review request for hive and Carl Steinbach. Changes --- This revision addresses the review comments from revision#3, particularly the following, * Fixes the TODOs. There is still one outstanding TODO - make the accuracy a user provided parameter for Flajolet-Martin sketch in NumDisinctValueEstimator.java * Fixes the formatting * Uses java generics on LHS except in StatsSemanticAnalyzer.java. StatsSemanticAnalyzer.java inherits from BaseSemanticAnalyzer.java and one of methods StatsSemanticAnalyzer over rides from BaseSemanticAnalyzer returns a HashSet instead of a Set. This patch doesn't use generics on the LHS in this particular instance. This is beyond the scope of this JIRA, will be happy to do it as part of a cleanup JIRA. * Replaces shortened variable names with long variable names Description --- This patch implements version 1 of the column statistics project in Hive. It adds support for computing and persisting statistical summary of column values in Hive Tables and Partitions. In order to support column statistics in Hive, this patch does the following, * Adds a new compute stats UDAF to compute scalar statistics for all primitive Hive data types. In version 1 of the project, we support the following scalar statistics on primitive types - estimate of number of distinct values, number of null values, number of trues/falses for boolean typed columsn, max and avg length for string and binary typed columns, max and min value for long and double typed columns. Note that version 1 of the column stats project includes support for column statistics both at the table and partition level. * Adds Metastore schema tables to persist the newly added statistics both at table and partition level. * Adds Metastore Thrift API to persist, retrieve and delete column statistics at both table and partition level. Please refer to the following wiki link for the details of the schema and the Thrift API changes - https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive * Extends the analyze table compute statistics statement to trigger statistics computation and persistence for one or more columns. Please note that statistics for multiple columns is computed through a single scan of the table data. Please refer to the following wiki link for the syntax changes - https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive One thing missing from the patch at this point is the metastore upgrade scrips for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to finalize the metastore schema changes before I go ahead and add the upgrade scripts. In a follow on patch, as part of version 2 of the column statistics project, we will add support for computing, persisting and retrieving histograms on long and double typed column values. Generated Thrift files have been removed for viewing pleasure. JIRA page has the patch with the generated Thrift files. This addresses bug HIVE-1362. https://issues.apache.org/jira/browse/HIVE-1362 Diffs (updated) - data/files/UserVisits.dat PRE-CREATION data/files/binary.txt PRE-CREATION data/files/bool.txt PRE-CREATION data/files/double.txt PRE-CREATION data/files/employee.dat PRE-CREATION data/files/employee2.dat PRE-CREATION data/files/int.txt PRE-CREATION ivy/libraries.properties 7ac6778 metastore/if/hive_metastore.thrift d4fad72 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 8fec13d metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 17b986c metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 3883b5b metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java eff44b1 metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bf5ae3a metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 77d1caa metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java PRE-CREATION metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java PRE-CREATION metastore/src/model/package.jdo 38ce6d5 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 528a100 metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 925938d ql/build.xml 5de3f78 ql/if/queryplan.thrift 05fbf58 ql/ivy.xml aa3b8ce ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 425900d ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 4446952 ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 79b87f1
Re: Review Request: HIVE-1362: Support for column statistics in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6878/ --- (Updated Oct. 3, 2012, 7:16 p.m.) Review request for hive and Carl Steinbach. Changes --- This revision addresses the review comments from revision#3, particularly the following, * Fixes the TODOs. There is still one outstanding TODO - make the accuracy a user provided parameter for Flajolet-Martin sketch in NumDisinctValueEstimator.java * Fixes the formatting * Uses java generics on LHS except in StatsSemanticAnalyzer.java. StatsSemanticAnalyzer.java inherits from BaseSemanticAnalyzer.java and one of methods StatsSemanticAnalyzer over rides from BaseSemanticAnalyzer returns a HashSet instead of a Set. This patch doesn't use generics on the LHS in this particular instance. This is beyond the scope of this JIRA, will be happy to do it as part of a cleanup JIRA. * Replaces shortened variable names with long variable names Description --- This patch implements version 1 of the column statistics project in Hive. It adds support for computing and persisting statistical summary of column values in Hive Tables and Partitions. In order to support column statistics in Hive, this patch does the following, * Adds a new compute stats UDAF to compute scalar statistics for all primitive Hive data types. In version 1 of the project, we support the following scalar statistics on primitive types - estimate of number of distinct values, number of null values, number of trues/falses for boolean typed columsn, max and avg length for string and binary typed columns, max and min value for long and double typed columns. Note that version 1 of the column stats project includes support for column statistics both at the table and partition level. * Adds Metastore schema tables to persist the newly added statistics both at table and partition level. * Adds Metastore Thrift API to persist, retrieve and delete column statistics at both table and partition level. Please refer to the following wiki link for the details of the schema and the Thrift API changes - https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive * Extends the analyze table compute statistics statement to trigger statistics computation and persistence for one or more columns. Please note that statistics for multiple columns is computed through a single scan of the table data. Please refer to the following wiki link for the syntax changes - https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive One thing missing from the patch at this point is the metastore upgrade scrips for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to finalize the metastore schema changes before I go ahead and add the upgrade scripts. In a follow on patch, as part of version 2 of the column statistics project, we will add support for computing, persisting and retrieving histograms on long and double typed column values. Generated Thrift files have been removed for viewing pleasure. JIRA page has the patch with the generated Thrift files. This addresses bug HIVE-1362. https://issues.apache.org/jira/browse/HIVE-1362 Diffs - data/files/UserVisits.dat PRE-CREATION data/files/binary.txt PRE-CREATION data/files/bool.txt PRE-CREATION data/files/double.txt PRE-CREATION data/files/employee.dat PRE-CREATION data/files/employee2.dat PRE-CREATION data/files/int.txt PRE-CREATION ivy/libraries.properties 7ac6778 metastore/if/hive_metastore.thrift d4fad72 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 8fec13d metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 17b986c metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 3883b5b metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java eff44b1 metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bf5ae3a metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 77d1caa metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java PRE-CREATION metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java PRE-CREATION metastore/src/model/package.jdo 38ce6d5 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 528a100 metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 925938d ql/build.xml 5de3f78 ql/if/queryplan.thrift 05fbf58 ql/ivy.xml aa3b8ce ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 425900d ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 4446952 ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 79b87f1
[jira] [Updated] (HIVE-1362) column level statistics
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-1362: - Status: Patch Available (was: Open) column level statistics --- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1362) column level statistics
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-1362: - Attachment: HIVE-1362.3.patch.txt column level statistics --- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1362) column level statistics
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468754#comment-13468754 ] Shreepadma Venugopalan commented on HIVE-1362: -- Latest revision which addresses Namit's comments is on review board. column level statistics --- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1362) column level statistics
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-1362: - Attachment: HIVE-1362-gen_thrift.3.patch.txt column level statistics --- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3522) Make separator for Entity name configurable
[ https://issues.apache.org/jira/browse/HIVE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghotham Murthy updated HIVE-3522: --- Attachment: hive-3522.1.patch Make separator for Entity name configurable --- Key: HIVE-3522 URL: https://issues.apache.org/jira/browse/HIVE-3522 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Raghotham Murthy Assignee: Raghotham Murthy Priority: Trivial Attachments: hive-3522.1.patch Right now its hard-coded to '@' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3522) Make separator for Entity name configurable
[ https://issues.apache.org/jira/browse/HIVE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghotham Murthy updated HIVE-3522: --- Status: Patch Available (was: In Progress) Make separator for Entity name configurable --- Key: HIVE-3522 URL: https://issues.apache.org/jira/browse/HIVE-3522 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Raghotham Murthy Assignee: Raghotham Murthy Priority: Trivial Attachments: hive-3522.1.patch Right now its hard-coded to '@' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #157
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/157/ -- [...truncated 36610 lines...] [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/jenkins/hive_2012-10-03_13-48-40_623_4031868789239429849/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201210031348_928690341.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath '/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt [junit] Copying file: file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath '/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/jenkins/hive_2012-10-03_13-48-46_620_6433981211820663306/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/jenkins/hive_2012-10-03_13-48-46_620_6433981211820663306/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201210031348_424709819.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201210031348_1052999737.txt [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201210031348_252025720.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type:
[jira] [Created] (HIVE-3526) Column Statistics - Add support for equi-height histograms on numeric columns
Shreepadma Venugopalan created HIVE-3526: Summary: Column Statistics - Add support for equi-height histograms on numeric columns Key: HIVE-3526 URL: https://issues.apache.org/jira/browse/HIVE-3526 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan This JIRA covers the task of adding support for equi-height histograms on numeric columns in Hive tables and partitions. This task involves a) implementing a UDAF to compute equi-height histograms on numeric columns, b) persisting the histogram to the metastore along with other column statistics , c) enhancing the thrift api to retrieve the histogram along with the other statistics and d) extending the grammar of ANALYZE to allow the user to request histograms and specify the number of bins. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3525) Avro Maps with Nullable Values fail with NPE
Sean Busbey created HIVE-3525: - Summary: Avro Maps with Nullable Values fail with NPE Key: HIVE-3525 URL: https://issues.apache.org/jira/browse/HIVE-3525 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Sean Busbey When working against current trunk@1393794, using a backing Avro schema that has a Map field with nullable values causes a NPE on deserialization when the map contains a null value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3526) Column Statistics - Add support for equi-height histograms on numeric columns
[ https://issues.apache.org/jira/browse/HIVE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468905#comment-13468905 ] Shreepadma Venugopalan commented on HIVE-3526: -- HIVE-1362 covers the task of adding support for column level statistics in Hive. Column Statistics - Add support for equi-height histograms on numeric columns - Key: HIVE-3526 URL: https://issues.apache.org/jira/browse/HIVE-3526 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan This JIRA covers the task of adding support for equi-height histograms on numeric columns in Hive tables and partitions. This task involves a) implementing a UDAF to compute equi-height histograms on numeric columns, b) persisting the histogram to the metastore along with other column statistics , c) enhancing the thrift api to retrieve the histogram along with the other statistics and d) extending the grammar of ANALYZE to allow the user to request histograms and specify the number of bins. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3525) Avro Maps with Nullable Values fail with NPE
[ https://issues.apache.org/jira/browse/HIVE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HIVE-3525: -- Attachment: HIVE-3525.1.patch.txt Patch with unit tests that expresses the NPE on deserialization and during the roundtrip for serialization. Also shows that the object inspector is behaving correctly. Avro Maps with Nullable Values fail with NPE Key: HIVE-3525 URL: https://issues.apache.org/jira/browse/HIVE-3525 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Sean Busbey Attachments: HIVE-3525.1.patch.txt When working against current trunk@1393794, using a backing Avro schema that has a Map field with nullable values causes a NPE on deserialization when the map contains a null value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3525) Avro Maps with Nullable Values fail with NPE
[ https://issues.apache.org/jira/browse/HIVE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468907#comment-13468907 ] Shreepadma Venugopalan commented on HIVE-3525: -- @Sean: Can you post a review request on review board or on phabricator? Thanks. Avro Maps with Nullable Values fail with NPE Key: HIVE-3525 URL: https://issues.apache.org/jira/browse/HIVE-3525 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Sean Busbey Attachments: HIVE-3525.1.patch.txt When working against current trunk@1393794, using a backing Avro schema that has a Map field with nullable values causes a NPE on deserialization when the map contains a null value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3525) Avro Maps with Nullable Values fail with NPE
[ https://issues.apache.org/jira/browse/HIVE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468910#comment-13468910 ] Sean Busbey commented on HIVE-3525: --- It looks like this is because the Avro SerDe uses a Hashtable when reading out Avro Maps. The BinarySortableSerDe uses HashMap, so presumably it could as well. Avro Maps with Nullable Values fail with NPE Key: HIVE-3525 URL: https://issues.apache.org/jira/browse/HIVE-3525 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Sean Busbey Attachments: HIVE-3525.1.patch.txt When working against current trunk@1393794, using a backing Avro schema that has a Map field with nullable values causes a NPE on deserialization when the map contains a null value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3525) Avro Maps with Nullable Values fail with NPE
[ https://issues.apache.org/jira/browse/HIVE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468914#comment-13468914 ] Sean Busbey commented on HIVE-3525: --- [~shreepadma] Sure thing. Should I wait till the patch contains a solution, or just while it's still the tests? Avro Maps with Nullable Values fail with NPE Key: HIVE-3525 URL: https://issues.apache.org/jira/browse/HIVE-3525 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Sean Busbey Attachments: HIVE-3525.1.patch.txt When working against current trunk@1393794, using a backing Avro schema that has a Map field with nullable values causes a NPE on deserialization when the map contains a null value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3525) Avro Maps with Nullable Values fail with NPE
[ https://issues.apache.org/jira/browse/HIVE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468915#comment-13468915 ] Shreepadma Venugopalan commented on HIVE-3525: -- It looks like you have a patch attached to the JIRA page. Is this a work in progress patch? Is this something you would like us to review? Its a lot easier to perform the review on phabricator/reviewboard. Avro Maps with Nullable Values fail with NPE Key: HIVE-3525 URL: https://issues.apache.org/jira/browse/HIVE-3525 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Sean Busbey Attachments: HIVE-3525.1.patch.txt When working against current trunk@1393794, using a backing Avro schema that has a Map field with nullable values causes a NPE on deserialization when the map contains a null value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3437) 0.23 compatibility: fix unit tests when building against 0.23
[ https://issues.apache.org/jira/browse/HIVE-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Drome updated HIVE-3437: -- Attachment: HIVE-3437-trunk-3.patch HIVE-3437-0.9-3.patch Updates that adrees reviewboard comments. Fix that gets NegativeMinimrCliDriver tests working with Hadoop 0.23.3. 0.23 compatibility: fix unit tests when building against 0.23 - Key: HIVE-3437 URL: https://issues.apache.org/jira/browse/HIVE-3437 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.9.0, 0.10.0 Reporter: Chris Drome Assignee: Chris Drome Fix For: 0.9.0, 0.10.0 Attachments: HIVE-3437-0.9-1.patch, HIVE-3437-0.9-2.patch, HIVE-3437-0.9-3.patch, HIVE-3437-0.9.patch, HIVE-3437-trunk-1.patch, HIVE-3437-trunk-2.patch, HIVE-3437-trunk-3.patch, HIVE-3437-trunk.patch Many unit tests fail as a result of building the code against hadoop 0.23. Initial focus will be to fix 0.9. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3527) Allow CREATE TABLE LIKE command to take TBLPROPERTIES
Kevin Wilfong created HIVE-3527: --- Summary: Allow CREATE TABLE LIKE command to take TBLPROPERTIES Key: HIVE-3527 URL: https://issues.apache.org/jira/browse/HIVE-3527 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong CREATE TABLE ... LIKE ... commands currently don't take TBLPROPERTIES. I think it would be a useful feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3527) Allow CREATE TABLE LIKE command to take TBLPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468934#comment-13468934 ] Kevin Wilfong commented on HIVE-3527: - https://reviews.facebook.net/D5847 Allow CREATE TABLE LIKE command to take TBLPROPERTIES - Key: HIVE-3527 URL: https://issues.apache.org/jira/browse/HIVE-3527 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3527.1.patch.txt CREATE TABLE ... LIKE ... commands currently don't take TBLPROPERTIES. I think it would be a useful feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3527) Allow CREATE TABLE LIKE command to take TBLPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-3527: Status: Patch Available (was: Open) Allow CREATE TABLE LIKE command to take TBLPROPERTIES - Key: HIVE-3527 URL: https://issues.apache.org/jira/browse/HIVE-3527 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3527.1.patch.txt CREATE TABLE ... LIKE ... commands currently don't take TBLPROPERTIES. I think it would be a useful feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3527) Allow CREATE TABLE LIKE command to take TBLPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-3527: Attachment: HIVE-3527.1.patch.txt Allow CREATE TABLE LIKE command to take TBLPROPERTIES - Key: HIVE-3527 URL: https://issues.apache.org/jira/browse/HIVE-3527 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3527.1.patch.txt CREATE TABLE ... LIKE ... commands currently don't take TBLPROPERTIES. I think it would be a useful feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Unit tests for reproducing HIVE-3525
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7430/ --- Review request for hive. Description --- Unit test reproducing HIVE-3525 Diffs - /trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java 1393794 /trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java 1393794 /trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java 1393794 Diff: https://reviews.apache.org/r/7430/diff/ Testing --- Run additional tests after patching against trunk. Uses an Avro Schema that has a single field which is a Map that allows null values. Object Inspector properly hides the union with null, but the deserializer can't actually handle null values. Thanks, Sean Busbey
Review Request: Unit tests to show failure to handle nullable complex types on serialization
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7431/ --- Review request for hive. Description --- Tests that express AvroSerDe's erroneous handling of Nullable complex types on serialization Diffs - /trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java 1393805 Diff: https://reviews.apache.org/r/7431/diff/ Testing --- Adds 7 tests that check each of the Avro types that Serialization needs to use a user-provided schema to handle. Thanks, Sean Busbey
[jira] [Updated] (HIVE-3467) BucketMapJoinOptimizer should optimize joins on partition columns
[ https://issues.apache.org/jira/browse/HIVE-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenxiao Luo updated HIVE-3467: --- Attachment: HIVE-3467.2.patch.txt BucketMapJoinOptimizer should optimize joins on partition columns - Key: HIVE-3467 URL: https://issues.apache.org/jira/browse/HIVE-3467 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Zhenxiao Luo Attachments: HIVE-3467.1.patch.txt, HIVE-3467.2.patch.txt Consider the query: SELECT * FROM t1 JOIN t2 on t1.part = t2.part and t1.key = t2.key; Where t1 and t2 are partitioned by part and bucketed by key. Suppose part take values 1 and 2 and t1 and t2 are bucketed into 2 buckets. The bucket map join optimizer will put the first bucket of part=1 and part=2 partitions of t2 into the same mapper as that of part=1 partition of t1. It will do the same for the part=2 partition of t1. It could take advantage of the partition values and send the first bucket of only the part=1 partitions of t1 and t2 into one mapper and the first bucket of only the part=2 partitions into another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3467) BucketMapJoinOptimizer should optimize joins on partition columns
[ https://issues.apache.org/jira/browse/HIVE-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468946#comment-13468946 ] Zhenxiao Luo commented on HIVE-3467: Comments addressed. Review request resubmitted at: https://reviews.facebook.net/D5769 BucketMapJoinOptimizer should optimize joins on partition columns - Key: HIVE-3467 URL: https://issues.apache.org/jira/browse/HIVE-3467 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Zhenxiao Luo Attachments: HIVE-3467.1.patch.txt, HIVE-3467.2.patch.txt Consider the query: SELECT * FROM t1 JOIN t2 on t1.part = t2.part and t1.key = t2.key; Where t1 and t2 are partitioned by part and bucketed by key. Suppose part take values 1 and 2 and t1 and t2 are bucketed into 2 buckets. The bucket map join optimizer will put the first bucket of part=1 and part=2 partitions of t2 into the same mapper as that of part=1 partition of t1. It will do the same for the part=2 partition of t1. It could take advantage of the partition values and send the first bucket of only the part=1 partitions of t1 and t2 into one mapper and the first bucket of only the part=2 partitions into another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3467) BucketMapJoinOptimizer should optimize joins on partition columns
[ https://issues.apache.org/jira/browse/HIVE-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenxiao Luo updated HIVE-3467: --- Status: Patch Available (was: Open) BucketMapJoinOptimizer should optimize joins on partition columns - Key: HIVE-3467 URL: https://issues.apache.org/jira/browse/HIVE-3467 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Zhenxiao Luo Attachments: HIVE-3467.1.patch.txt, HIVE-3467.2.patch.txt Consider the query: SELECT * FROM t1 JOIN t2 on t1.part = t2.part and t1.key = t2.key; Where t1 and t2 are partitioned by part and bucketed by key. Suppose part take values 1 and 2 and t1 and t2 are bucketed into 2 buckets. The bucket map join optimizer will put the first bucket of part=1 and part=2 partitions of t2 into the same mapper as that of part=1 partition of t1. It will do the same for the part=2 partition of t1. It could take advantage of the partition values and send the first bucket of only the part=1 partitions of t1 and t2 into one mapper and the first bucket of only the part=2 partitions into another. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema
Sean Busbey created HIVE-3528: - Summary: Avro SerDe doesn't handle serializing Nullable types that require access to a Schema Key: HIVE-3528 URL: https://issues.apache.org/jira/browse/HIVE-3528 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Sean Busbey Deserialization properly handles hiding Nullable Avro types, including complex types like record, map, array, etc. However, when Serialization attempts to write out these types it erroneously makes use of the UNION schema that contains NULL and the other type. This results in Schema mis-match errors for Record, Array, Enum, Fixed, and Bytes. Here's a [review board of unit tests that express the problem|https://reviews.apache.org/r/7431/], as well as one that supports the case that it's only when the schema is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3498) hivetest.py fails with --revision option
[ https://issues.apache.org/jira/browse/HIVE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468965#comment-13468965 ] Hudson commented on HIVE-3498: -- Integrated in Hive-trunk-h0.21 #1719 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1719/]) HIVE-3498. hivetest.py fails with --revision option. (Ivan Gorbachev via kevinwilfong) (Revision 1393676) Result = FAILURE kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1393676 Files : * /hive/trunk/testutils/ptest/hivetest.py hivetest.py fails with --revision option Key: HIVE-3498 URL: https://issues.apache.org/jira/browse/HIVE-3498 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Ivan Gorbachev Assignee: Ivan Gorbachev Labels: testing Fix For: 0.10.0 Attachments: jira-3498.0.patch How to reproduce outside hivetest.py: 1. Clone git://git.apache.org/hive.git 2. Run ant arc-setup 3. Run arc patch rev Output: {quote} This diff is against commit https://svn.apache.org/repos/asf/hive/trunk@1382631, but the commit is nowhere in the working copy. Try to apply it against the current working copy state? (d5f66df1edfff2645f225298e225dbccc70d97ff) [Y/n] {quote} If you choose 'Y' it suggests you to complete 'merge-message' and then prints: {quote} Select a Default Commit Range You're running a command which operates on a range of revisions (usually, from some revision to HEAD) but have not specified the revision that should determine the start of the range. Previously, arc assumed you meant 'HEAD^' when you did not specify a start revision, but this behavior does not make much sense in most workflows outside of Facebook's historic git-svn workflow. arc no longer assumes 'HEAD^'. You must specify a relative commit explicitly when you invoke a command (e.g., `arc diff HEAD^`, not just `arc diff`) or select a default for this working copy. In most cases, the best default is 'origin/master'. You can also select 'HEAD^' to preserve the old behavior, or some other remote or branch. But you almost certainly want to select 'origin/master'. (Technically: the merge-base of the selected revision and HEAD is used to determine the start of the commit range.) What default do you want to use? [origin/master] {quote} There isn't the same behavior for svn checkout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3522) Make separator for Entity name configurable
[ https://issues.apache.org/jira/browse/HIVE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghotham Murthy updated HIVE-3522: --- Attachment: hive-3522.2.patch Make separator for Entity name configurable --- Key: HIVE-3522 URL: https://issues.apache.org/jira/browse/HIVE-3522 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Raghotham Murthy Assignee: Raghotham Murthy Priority: Trivial Attachments: hive-3522.1.patch, hive-3522.2.patch Right now its hard-coded to '@' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3522) Make separator for Entity name configurable
[ https://issues.apache.org/jira/browse/HIVE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469017#comment-13469017 ] Kevin Wilfong commented on HIVE-3522: - +1 Looks good. Make separator for Entity name configurable --- Key: HIVE-3522 URL: https://issues.apache.org/jira/browse/HIVE-3522 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Raghotham Murthy Assignee: Raghotham Murthy Priority: Trivial Attachments: hive-3522.1.patch, hive-3522.2.patch Right now its hard-coded to '@' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3529) Incorrect partition bucket/sort metadata when overwriting partition with different metadata from table
Kevin Wilfong created HIVE-3529: --- Summary: Incorrect partition bucket/sort metadata when overwriting partition with different metadata from table Key: HIVE-3529 URL: https://issues.apache.org/jira/browse/HIVE-3529 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong If you have a partition with bucket/sort metadata set, then you alter the table to have different bucket/sort metadata, and insert overwrite the partition with hive.enforce.bucketing=true and/or hive.enforce.sorting=true, the partition data will be bucketed/sorted by the table's metadata, but the partition will have the same metadata. This could result in wrong results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3518) QTestUtil side-effects
[ https://issues.apache.org/jira/browse/HIVE-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3518: -- Attachment: HIVE-3518.D5865.1.patch navis requested code review of HIVE-3518 [jira] QTestUtil side-effects. Reviewers: JIRA DPAL-1907 QTestUtil side-effects It seems that QTestUtil has side-effects. This test (metadata_export_drop.q) causes failure of other tests on cleanup stage: Exception: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17 org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17 at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:845) at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:821) at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:445) at org.apache.hadoop.hive.ql.QTestUtil.shutdown(QTestUtil.java:300) at org.apache.hadoop.hive.cli.TestCliDriver.tearDown(TestCliDriver.java:87) at junit.framework.TestCase.runBare(TestCase.java:140) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196) Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17 at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:132) at org.apache.hadoop.fs.ProxyFileSystem.swizzleParamPath(ProxyFileSystem.java:56) at org.apache.hadoop.fs.ProxyFileSystem.mkdirs(ProxyFileSystem.java:214) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1120) at org.apache.hadoop.hive.ql.parse.MetaDataExportListener.export_meta_data(MetaDataExportListener.java:81) at org.apache.hadoop.hive.ql.parse.MetaDataExportListener.onEvent(MetaDataExportListener.java:106) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1024) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table(HiveMetaStore.java:1185) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:566) at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:839) ... 17 more Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17 at java.net.URI.checkPath(URI.java:1787) at java.net.URI.init(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:137) ... 28 more Flushing 'hive.metastore.pre.event.listeners' into empty string solves the issue. During debugging I figured out this property wan't cleaned for other tests after it was set in metadata_export_drop.q. How to reproduce: ant test -Dtestcase=TestCliDriver -Dqfile=metadata_export_drop.q,some test.q where some test.q means any test which contains CREATE statement. For example, sample10.q TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D5865 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/processors/ResetProcessor.java ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/13893/ To: JIRA, navis QTestUtil side-effects -- Key: HIVE-3518 URL: https://issues.apache.org/jira/browse/HIVE-3518 Project: Hive Issue Type: Bug Components: Testing Infrastructure, Tests Reporter: Ivan Gorbachev Attachments: HIVE-3518.D5865.1.patch, metadata_export_drop.q It seems that QTestUtil has side-effects. This test ([^metadata_export_drop.q]) causes failure of
[jira] [Commented] (HIVE-3518) QTestUtil side-effects
[ https://issues.apache.org/jira/browse/HIVE-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469066#comment-13469066 ] Navis commented on HIVE-3518: - QTestUtil creates new HiveConf per test for removing side effects but it's not propagated to entities like SessionState or MetaStoreClient. The patch is fixing it and not yet tested. After that I'll mark this patch-available. QTestUtil side-effects -- Key: HIVE-3518 URL: https://issues.apache.org/jira/browse/HIVE-3518 Project: Hive Issue Type: Bug Components: Testing Infrastructure, Tests Reporter: Ivan Gorbachev Attachments: HIVE-3518.D5865.1.patch, metadata_export_drop.q It seems that QTestUtil has side-effects. This test ([^metadata_export_drop.q]) causes failure of other tests on cleanup stage: {quote} Exception: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17 org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17 at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:845) at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:821) at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:445) at org.apache.hadoop.hive.ql.QTestUtil.shutdown(QTestUtil.java:300) at org.apache.hadoop.hive.cli.TestCliDriver.tearDown(TestCliDriver.java:87) at junit.framework.TestCase.runBare(TestCase.java:140) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196) Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17 at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:132) at org.apache.hadoop.fs.ProxyFileSystem.swizzleParamPath(ProxyFileSystem.java:56) at org.apache.hadoop.fs.ProxyFileSystem.mkdirs(ProxyFileSystem.java:214) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1120) at org.apache.hadoop.hive.ql.parse.MetaDataExportListener.export_meta_data(MetaDataExportListener.java:81) at org.apache.hadoop.hive.ql.parse.MetaDataExportListener.onEvent(MetaDataExportListener.java:106) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1024) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table(HiveMetaStore.java:1185) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:566) at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:839) ... 17 more Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17 at java.net.URI.checkPath(URI.java:1787) at java.net.URI.init(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:137) ... 28 more {quote} Flushing 'hive.metastore.pre.event.listeners' into empty string solves the issue. During debugging I figured out this property wan't cleaned for other tests after it was set in metadata_export_drop.q. How to reproduce: {code} ant test -Dtestcase=TestCliDriver -Dqfile=metadata_export_drop.q,some test.q{code} where some test.q means any test which contains CREATE statement. For example, sample10.q -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3529) Incorrect partition bucket/sort metadata when overwriting partition with different metadata from table
[ https://issues.apache.org/jira/browse/HIVE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469067#comment-13469067 ] Kevin Wilfong commented on HIVE-3529: - My proposed fix is to by default always overwrite the partition's bucket/sorting metadata with that of the table when overwriting a table. My main motivation for doing this vs. using the partition's metadata is dynamic partitions. The potential for having to manage maintaining all the different bucket/sorting schemes across several partitions which are overwritten dynamically sounds like a new feature rather than a bug fix, and could be done in a separate JIRA. Incorrect partition bucket/sort metadata when overwriting partition with different metadata from table -- Key: HIVE-3529 URL: https://issues.apache.org/jira/browse/HIVE-3529 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong If you have a partition with bucket/sort metadata set, then you alter the table to have different bucket/sort metadata, and insert overwrite the partition with hive.enforce.bucketing=true and/or hive.enforce.sorting=true, the partition data will be bucketed/sorted by the table's metadata, but the partition will have the same metadata. This could result in wrong results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3036) hive should support BigDecimal datatype
[ https://issues.apache.org/jira/browse/HIVE-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-3036: - Component/s: Types hive should support BigDecimal datatype --- Key: HIVE-3036 URL: https://issues.apache.org/jira/browse/HIVE-3036 Project: Hive Issue Type: New Feature Components: Query Processor, Types Affects Versions: 0.7.1, 0.8.0, 0.8.1 Reporter: Anurag Tangri Fix For: 0.10.0 hive has support for big int but people have use cases where they need decimal precision to a big value. Values in question are like decimal(x,y). for eg. decimal of form (17,6) which cannot be represented by float/double. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1977) DESCRIBE TABLE syntax doesn't support specifying a database qualified table name
[ https://issues.apache.org/jira/browse/HIVE-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenxiao Luo updated HIVE-1977: --- Attachment: HIVE-1977.2.patch.txt DESCRIBE TABLE syntax doesn't support specifying a database qualified table name Key: HIVE-1977 URL: https://issues.apache.org/jira/browse/HIVE-1977 Project: Hive Issue Type: Bug Components: Database/Schema, Query Processor, SQL Reporter: Carl Steinbach Assignee: Zhenxiao Luo Attachments: HIVE-1977.1.patch.txt, HIVE-1977.2.patch.txt The syntax for DESCRIBE is broken. It should be: {code} DESCRIBE [EXTENDED] [database DOT]table [column] {code} but is actually {code} DESCRIBE [EXTENDED] table[DOT col_name] {code} Ref: http://dev.mysql.com/doc/refman/5.0/en/describe.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1977) DESCRIBE TABLE syntax doesn't support specifying a database qualified table name
[ https://issues.apache.org/jira/browse/HIVE-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469096#comment-13469096 ] Zhenxiao Luo commented on HIVE-1977: @Namit: Thanks for your comments. I updated the patch, did the following: 1. Instead of adding a new conf, try database.table first, if not valid(via tableValidCheck and databaseValidCheck), try table.column. 2. get rid of isStandardSyntax, re-work the code Review request submitted at: https://reviews.facebook.net/D5763 DESCRIBE TABLE syntax doesn't support specifying a database qualified table name Key: HIVE-1977 URL: https://issues.apache.org/jira/browse/HIVE-1977 Project: Hive Issue Type: Bug Components: Database/Schema, Query Processor, SQL Reporter: Carl Steinbach Assignee: Zhenxiao Luo Attachments: HIVE-1977.1.patch.txt, HIVE-1977.2.patch.txt The syntax for DESCRIBE is broken. It should be: {code} DESCRIBE [EXTENDED] [database DOT]table [column] {code} but is actually {code} DESCRIBE [EXTENDED] table[DOT col_name] {code} Ref: http://dev.mysql.com/doc/refman/5.0/en/describe.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1977) DESCRIBE TABLE syntax doesn't support specifying a database qualified table name
[ https://issues.apache.org/jira/browse/HIVE-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenxiao Luo updated HIVE-1977: --- Status: Patch Available (was: Open) DESCRIBE TABLE syntax doesn't support specifying a database qualified table name Key: HIVE-1977 URL: https://issues.apache.org/jira/browse/HIVE-1977 Project: Hive Issue Type: Bug Components: Database/Schema, Query Processor, SQL Reporter: Carl Steinbach Assignee: Zhenxiao Luo Attachments: HIVE-1977.1.patch.txt, HIVE-1977.2.patch.txt The syntax for DESCRIBE is broken. It should be: {code} DESCRIBE [EXTENDED] [database DOT]table [column] {code} but is actually {code} DESCRIBE [EXTENDED] table[DOT col_name] {code} Ref: http://dev.mysql.com/doc/refman/5.0/en/describe.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-3525
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7430/ --- (Updated Oct. 4, 2012, 3:11 a.m.) Review request for hive. Changes --- Now includes a proposed fix, changing internal Hashtable use to HashMap. Summary (updated) - HIVE-3525 Description (updated) --- Changes Avro SerDe to use HashMap when copying out the Avro MapUtf8, Object to MapString, Object. fixes HIVE-3525. Diffs (updated) - /trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 1393805 /trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java 1393805 /trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java 1393805 /trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java 1393805 Diff: https://reviews.apache.org/r/7430/diff/ Testing (updated) --- Includes unit tests for * AvroObjectInspectorGenerator to verify that the Nullable value type is presented as just the non-null type. * AvroDeserializer to verify that Maps with null are properly handled * AvroSerializer to verify that Maps with null can round trip. Thanks, Sean Busbey
[jira] [Commented] (HIVE-3525) Avro Maps with Nullable Values fail with NPE
[ https://issues.apache.org/jira/browse/HIVE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469112#comment-13469112 ] Sean Busbey commented on HIVE-3525: --- [Review Board #7430|https://reviews.apache.org/r/7430/] Now contains a proposed fix as well as tests. Avro Maps with Nullable Values fail with NPE Key: HIVE-3525 URL: https://issues.apache.org/jira/browse/HIVE-3525 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Sean Busbey Attachments: HIVE-3525.1.patch.txt When working against current trunk@1393794, using a backing Avro schema that has a Map field with nullable values causes a NPE on deserialization when the map contains a null value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3501) Track table and keys used in joins and group bys for logging
[ https://issues.apache.org/jira/browse/HIVE-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469133#comment-13469133 ] Namit Jain commented on HIVE-3501: -- @Carl, let me know if you are swamped with other issues. I can start the tests and commit it if everything goes fine. Track table and keys used in joins and group bys for logging Key: HIVE-3501 URL: https://issues.apache.org/jira/browse/HIVE-3501 Project: Hive Issue Type: Task Components: Query Processor Affects Versions: 0.10.0 Reporter: Sambavi Muthukrishnan Assignee: Sambavi Muthukrishnan Priority: Minor Attachments: table_access_keys.1.patch, table_access_keys.2.patch, table_access_keys.3.patch, table_access_keys.4.patch, table_access_keys.5.patch Original Estimate: 96h Remaining Estimate: 96h For all operators that could benefit from bucketing, it will be useful to keep track of and log the table names and key column names in order for the operator to be converted to the bucketed version. This task is to track this information for joins and group bys when the keys can be directly mapped back to table scans and columns on that table. This information will be tracked on the QueryPlan object so it is available to any pre/post execution hooks for logging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive
[ https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469134#comment-13469134 ] Namit Jain commented on HIVE-3433: -- The tests finished successfully Implement CUBE and ROLLUP operators in Hive --- Key: HIVE-3433 URL: https://issues.apache.org/jira/browse/HIVE-3433 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Sambavi Muthukrishnan Assignee: Namit Jain Attachments: hive.3433.1.patch, hive.3433.2.patch, hive.3433.3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2874) Renaming external partition changes location
[ https://issues.apache.org/jira/browse/HIVE-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469135#comment-13469135 ] Namit Jain commented on HIVE-2874: -- You are right. The location should not change by a rename in external's table partition. Renaming external partition changes location Key: HIVE-2874 URL: https://issues.apache.org/jira/browse/HIVE-2874 Project: Hive Issue Type: Bug Reporter: Kevin Wilfong Assignee: Zhenxiao Luo Attachments: HIVE-2874.1.patch.txt, HIVE-2874.2.patch.txt, HIVE-2874.3.patch.txt Renaming an external partition will change the location of that partition to the default location of a managed partition with the same name. E.g. If ex_table is external and has partition part=1 with location /.../managed_table/part=1 Calling ALTER TABLE ex_table PARTITION (part = '1') RENAME TO PARTITION (part = '2'); Will change the location of the partition to /.../ex_table/part=2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3514: --- Attachment: HIVE-3514.patch.3 Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3 Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3514: - Status: Open (was: Patch Available) comments on phabricator Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3 Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1362) column level statistics
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469143#comment-13469143 ] shrikanth shankar commented on HIVE-1362: - I had a couple of high level comments on the patch that seem to fit better here rather than on the review board. Apologies if this violates protocol (1) The count_stats aggregation operator 'repeats' many existing aggregates that Hive already supports (count of nulls, count true's, max, min etc). It might make a lot more sense to just add an aggregate to return the approximate number of distinct values for a column. Any reason why stats collection cant just generate more expressions in the SQL? (2) There might even be value in adding a different UDAF which just returns a serialized numDV estimator. Storing this (instead of the count) could be useful in other places e.g. combining numDV estimates across partitions (A second UDAF would be needed to support aggregating these but that seems easy) column level statistics --- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1362) column level statistics
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1362: - Status: Open (was: Patch Available) Questions on the jira ? column level statistics --- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2874) Renaming external partition changes location
[ https://issues.apache.org/jira/browse/HIVE-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469157#comment-13469157 ] Namit Jain commented on HIVE-2874: -- +1 Renaming external partition changes location Key: HIVE-2874 URL: https://issues.apache.org/jira/browse/HIVE-2874 Project: Hive Issue Type: Bug Reporter: Kevin Wilfong Assignee: Zhenxiao Luo Attachments: HIVE-2874.1.patch.txt, HIVE-2874.2.patch.txt, HIVE-2874.3.patch.txt Renaming an external partition will change the location of that partition to the default location of a managed partition with the same name. E.g. If ex_table is external and has partition part=1 with location /.../managed_table/part=1 Calling ALTER TABLE ex_table PARTITION (part = '1') RENAME TO PARTITION (part = '2'); Will change the location of the partition to /.../ex_table/part=2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-3514 started by Gang Tim Liu. Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, HIVE-3514.patch.4 Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3514: --- Attachment: HIVE-3514.patch.4 Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, HIVE-3514.patch.4 Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3514: --- Status: Patch Available (was: In Progress) patch is available on both places. Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, HIVE-3514.patch.4 Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3530) warnings in Hive.g
Namit Jain created HIVE-3530: Summary: warnings in Hive.g Key: HIVE-3530 URL: https://issues.apache.org/jira/browse/HIVE-3530 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain [echo] Building Grammar /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g [java] ANTLR Parser Generator Version 3.0.1 (August 13, 2007) 1989-2007 [java] warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:578:5: Decision can ma\ tch input such as Identifier KW_RENAME KW_TO using multiple alternatives: 1, 10 [java] As a result, alternative(s) 10 were disabled for that input [java] warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1607:5: Decision can m\ atch input such as Identifier DOT Identifier using multiple alternatives: 1, 2 [java] As a result, alternative(s) 2 were disabled for that input [java] warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1823:5: Decision can m\ atch input such as KW_ORDER KW_BY LPAREN using multiple alternatives: 1, 2 [java] As a result, alternative(s) 2 were disabled for that input [java] warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1836:5: Decision can m\ atch input such as KW_CLUSTER KW_BY LPAREN using multiple alternatives: 1, 2 [java] As a result, alternative(s) 2 were disabled for that input [java] warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1848:5: Decision can m\ atch input such as KW_DISTRIBUTE KW_BY LPAREN using multiple alternatives: 1, 2 [java] As a result, alternative(s) 2 were disabled for that input [java] warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1859:5: Decision can m\ atch input such as KW_SORT KW_BY LPAREN using multiple alternatives: 1, 2 [java] As a result, alternative(s) 2 were disabled for that input Most of these seem to be due to HIVE-1367 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3530) warnings in Hive.g
[ https://issues.apache.org/jira/browse/HIVE-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3530: - Description: Building Grammar /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g ANTLR Parser Generator Version 3.0.1 (August 13, 2007) 1989-2007 warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:578:5: Decision can match input such as Identifier KW_RENAME KW_TO using multiple alternatives: 1, 10 As a result, alternative(s) 10 were disabled for that input warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1607:5: Decision can m atch input such as Identifier DOT Identifier using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1823:5: Decision can match input such as KW_ORDER KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1836:5: Decision can match input such as KW_CLUSTER KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1848:5: Decision can match input such as KW_DISTRIBUTE KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1859:5: Decision can match input such as KW_SORT KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input Most of these seem to be due to HIVE-1367 was: [echo] Building Grammar /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g [java] ANTLR Parser Generator Version 3.0.1 (August 13, 2007) 1989-2007 [java] warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:578:5: Decision can ma\ tch input such as Identifier KW_RENAME KW_TO using multiple alternatives: 1, 10 [java] As a result, alternative(s) 10 were disabled for that input [java] warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1607:5: Decision can m\ atch input such as Identifier DOT Identifier using multiple alternatives: 1, 2 [java] As a result, alternative(s) 2 were disabled for that input [java] warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1823:5: Decision can m\ atch input such as KW_ORDER KW_BY LPAREN using multiple alternatives: 1, 2 [java] As a result, alternative(s) 2 were disabled for that input [java] warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1836:5: Decision can m\ atch input such as KW_CLUSTER KW_BY LPAREN using multiple alternatives: 1, 2 [java] As a result, alternative(s) 2 were disabled for that input [java] warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1848:5: Decision can m\ atch input such as KW_DISTRIBUTE KW_BY LPAREN using multiple alternatives: 1, 2 [java] As a result, alternative(s) 2 were disabled for that input [java] warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1859:5: Decision can m\ atch input such as KW_SORT KW_BY LPAREN using multiple alternatives: 1, 2 [java] As a result, alternative(s) 2 were disabled for that input Most of these seem to be due to HIVE-1367 warnings in Hive.g -- Key: HIVE-3530 URL: https://issues.apache.org/jira/browse/HIVE-3530 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Building Grammar /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g ANTLR Parser Generator Version 3.0.1 (August 13, 2007) 1989-2007 warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:578:5: Decision can match input such as Identifier KW_RENAME KW_TO using multiple alternatives: 1, 10 As a result, alternative(s) 10 were disabled for that input warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1607:5: Decision can m atch input such as Identifier DOT Identifier using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1823:5: Decision
[jira] [Commented] (HIVE-3530) warnings in Hive.g
[ https://issues.apache.org/jira/browse/HIVE-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469160#comment-13469160 ] Namit Jain commented on HIVE-3530: -- [~zhenxiao], can you take a look if possible ? warnings in Hive.g -- Key: HIVE-3530 URL: https://issues.apache.org/jira/browse/HIVE-3530 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Building Grammar /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g ANTLR Parser Generator Version 3.0.1 (August 13, 2007) 1989-2007 warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:578:5: Decision can match input such as Identifier KW_RENAME KW_TO using multiple alternatives: 1, 10 As a result, alternative(s) 10 were disabled for that input warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1607:5: Decision can m atch input such as Identifier DOT Identifier using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1823:5: Decision can match input such as KW_ORDER KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1836:5: Decision can match input such as KW_CLUSTER KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1848:5: Decision can match input such as KW_DISTRIBUTE KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): /Users/njain/hive/hive3/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1859:5: Decision can match input such as KW_SORT KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input Most of these seem to be due to HIVE-1367 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain reassigned HIVE-3514: Assignee: Gang Tim Liu more comments Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, HIVE-3514.patch.4 Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3514: - Assignee: (was: Gang Tim Liu) Status: Open (was: Patch Available) Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, HIVE-3514.patch.4 Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-1362: Support for column statistics in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6878/ --- (Updated Oct. 4, 2012, 5:45 a.m.) Review request for hive and Carl Steinbach. Changes --- Previous version of the patch didn't render correctly. This version fixes that problem. Sorry abt the earlier version. Description --- This patch implements version 1 of the column statistics project in Hive. It adds support for computing and persisting statistical summary of column values in Hive Tables and Partitions. In order to support column statistics in Hive, this patch does the following, * Adds a new compute stats UDAF to compute scalar statistics for all primitive Hive data types. In version 1 of the project, we support the following scalar statistics on primitive types - estimate of number of distinct values, number of null values, number of trues/falses for boolean typed columsn, max and avg length for string and binary typed columns, max and min value for long and double typed columns. Note that version 1 of the column stats project includes support for column statistics both at the table and partition level. * Adds Metastore schema tables to persist the newly added statistics both at table and partition level. * Adds Metastore Thrift API to persist, retrieve and delete column statistics at both table and partition level. Please refer to the following wiki link for the details of the schema and the Thrift API changes - https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive * Extends the analyze table compute statistics statement to trigger statistics computation and persistence for one or more columns. Please note that statistics for multiple columns is computed through a single scan of the table data. Please refer to the following wiki link for the syntax changes - https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive One thing missing from the patch at this point is the metastore upgrade scrips for MySQL/Derby/Postgres/Oracle. I'm waiting for the review to finalize the metastore schema changes before I go ahead and add the upgrade scripts. In a follow on patch, as part of version 2 of the column statistics project, we will add support for computing, persisting and retrieving histograms on long and double typed column values. Generated Thrift files have been removed for viewing pleasure. JIRA page has the patch with the generated Thrift files. This addresses bug HIVE-1362. https://issues.apache.org/jira/browse/HIVE-1362 Diffs (updated) - data/files/UserVisits.dat PRE-CREATION data/files/binary.txt PRE-CREATION data/files/bool.txt PRE-CREATION data/files/double.txt PRE-CREATION data/files/employee.dat PRE-CREATION data/files/employee2.dat PRE-CREATION data/files/int.txt PRE-CREATION ivy/libraries.properties 7ac6778 metastore/if/hive_metastore.thrift d4fad72 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 8fec13d metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 17b986c metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 3883b5b metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java eff44b1 metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bf5ae3a metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 77d1caa metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java PRE-CREATION metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java PRE-CREATION metastore/src/model/package.jdo 38ce6d5 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 528a100 metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 925938d ql/build.xml 5de3f78 ql/if/queryplan.thrift 05fbf58 ql/ivy.xml aa3b8ce ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 425900d ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 4446952 ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 79b87f1 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 7440889 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java 0b55ac4 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 344dc69 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java f7257cd ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java e75a075 ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java 61bc7fd ql/src/java/org/apache/hadoop/hive/ql/parse/FunctionSemanticAnalyzer.java 6024dd4 ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
[jira] [Updated] (HIVE-1362) column level statistics
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-1362: - Attachment: HIVE-1362.4.patch.txt column level statistics --- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1362) column level statistics
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-1362: - Attachment: HIVE-1362-gen_thrift.4.patch.txt column level statistics --- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
RE: Hive Connection Error
Hi, How to un-subscribe. Thanks Regards, Deepak Talim Architect | Analytics Information Management | Wipro Technologies | Pune Phone - VOIP: 8547081 | D: +91 +20 +39132608 | M: +91 98816 90900 -Original Message- From: deepak.ta...@wipro.com [mailto:deepak.ta...@wipro.com] Sent: Tuesday, July 03, 2012 5:44 PM To: dev@hive.apache.org Subject: Hive Connection Error Hi, While trying to connect to Hive using talend 5 'HiveConnection' getting following error: Technical details: Coludera ver CDH3 Apache ver 1 Hive ver 7.0 HDFS Error: While connecting it's trying to create directory with the 'windows AD user id' on hdfs tmp directory, what permissions are required to provide or what is the solution. Error details: === [statistics] connecting to socket on port 3338 [statistics] connected 12/07/03 16:49:53 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH 12/07/03 16:49:54 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 12/07/03 16:49:54 INFO metastore.ObjectStore: ObjectStore, initialize called 12/07/03 16:49:54 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 12/07/03 16:49:54 INFO DataNucleus.Persistence: Property javax.jdo.option.NonTransactionalRead unknown - will be ignored 12/07/03 16:49:54 INFO DataNucleus.Persistence: = Persistence Configuration === 12/07/03 16:49:54 INFO DataNucleus.Persistence: DataNucleus Persistence Factory - Vendor: DataNucleus Version: 2.0.3 12/07/03 16:49:54 INFO DataNucleus.Persistence: DataNucleus Persistence Factory initialised for datastore URL=jdbc:derby:;databaseName=metastore_db;create=true driver=org.apache.derby.jdbc.EmbeddedDriver userName=APP 12/07/03 16:49:54 INFO DataNucleus.Persistence: === 12/07/03 16:49:57 INFO Datastore.Schema: Initialising Catalog , Schema APP using None auto-start option 12/07/03 16:49:57 INFO Datastore.Schema: Catalog , Schema APP initialised - managing 0 classes 12/07/03 16:49:57 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes=Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order 12/07/03 16:49:57 INFO DataNucleus.MetaData: Registering listener for metadata initialisation 12/07/03 16:49:57 INFO metastore.ObjectStore: Initialized ObjectStore 12/07/03 16:49:58 WARN DataNucleus.MetaData: MetaData Parser encountered an error in file jar:file:/D:/Deepak%20Talim/Technical/Talend5/Talend%20Project/.Java/lib/hive-metastore-0.8.1.jar!/package.jdo at line 11, column 6 : cvc-elt.1: Cannot find the declaration of element 'jdo'. - Please check your specification of DTD and the validity of the MetaData XML that you have specified. 12/07/03 16:49:58 WARN DataNucleus.MetaData: MetaData Parser encountered an error in file jar:file:/D:/Deepak%20Talim/Technical/Talend5/Talend%20Project/.Java/lib/hive-metastore-0.8.1.jar!/package.jdo at line 321, column 13 : The content of element type class must match (extension*,implements*,datastore-identity?,primary-key?,inheritance?,version?,join*,foreign-key*,index*,unique*,column*,field*,property*,query*,fetch-group*,extension*). - Please check your specification of DTD and the validity of the MetaData XML that you have specified. 12/07/03 16:49:58 WARN DataNucleus.MetaData: MetaData Parser encountered an error in file jar:file:/D:/Deepak%20Talim/Technical/Talend5/Talend%20Project/.Java/lib/hive-metastore-0.8.1.jar!/package.jdo at line 368, column 13 : The content of element type class must match (extension*,implements*,datastore-identity?,primary-key?,inheritance?,version?,join*,foreign-key*,index*,unique*,column*,field*,property*,query*,fetch-group*,extension*). - Please check your specification of DTD and the validity of the MetaData XML that you have specified. 12/07/03 16:49:59 WARN DataNucleus.MetaData: MetaData Parser encountered an error in file jar:file:/D:/Deepak%20Talim/Technical/Talend5/Talend%20Project/.Java/lib/hive-metastore-0.8.1.jar!/package.jdo at line 390, column 13 : The content of element type class must match (extension*,implements*,datastore-identity?,primary-key?,inheritance?,version?,join*,foreign-key*,index*,unique*,column*,field*,property*,query*,fetch-group*,extension*). - Please check your specification of DTD and the validity of the MetaData XML that you have specified. 12/07/03 16:49:59 WARN DataNucleus.MetaData: MetaData Parser encountered an error in file jar:file:/D:/Deepak%20Talim/Technical/Talend5/Talend%20Project/.Java/lib/hive-metastore-0.8.1.jar!/package.jdo at line 425, column 13 : The content of element type class must match (extension*,implements*,datastore-identity?,primary-key?,inheritance?,version?,join*,foreign-key*,index*,unique*,column*,field*,property*,query*,fetch-group*,extension*).
[jira] [Work started] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-3514 started by Gang Tim Liu. Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, HIVE-3514.patch.4, HIVE-3514.patch.5 Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3514: --- Status: Patch Available (was: In Progress) patch is available in both places. Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, HIVE-3514.patch.4, HIVE-3514.patch.5 Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3514) Refactor Partition Pruner so that logic can be reused.
[ https://issues.apache.org/jira/browse/HIVE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3514: --- Attachment: HIVE-3514.patch.5 Refactor Partition Pruner so that logic can be reused. -- Key: HIVE-3514 URL: https://issues.apache.org/jira/browse/HIVE-3514 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Gang Tim Liu Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3514.patch, HIVE-3514.patch.2, HIVE-3514.patch.3, HIVE-3514.patch.4, HIVE-3514.patch.5 Partition Pruner has logic reusable like 1. walk through operator tree 2. walk through operation tree 3. create pruning predicate The first candidate is list bucketing pruner. Some consideration: 1. refactor for general use case not just list bucketing 2. avoid over-refactor by focusing on pieces targeted for reuse -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira