[jira] [Updated] (HIVE-4042) ignore mapjoin hint
[ https://issues.apache.org/jira/browse/HIVE-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4042: - Attachment: hive.4042.2.patch ignore mapjoin hint --- Key: HIVE-4042 URL: https://issues.apache.org/jira/browse/HIVE-4042 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.4042.1.patch, hive.4042.2.patch After HIVE-3784, in a production environment, it can become difficult to deploy since a lot of production queries can break. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3938) Hive MetaStore should send a single AddPartitionEvent for atomically added partition-set.
[ https://issues.apache.org/jira/browse/HIVE-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3938: - Status: Open (was: Patch Available) Can you refresh once HIVE-4004 is in ? Hive MetaStore should send a single AddPartitionEvent for atomically added partition-set. - Key: HIVE-3938 URL: https://issues.apache.org/jira/browse/HIVE-3938 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-3938.patch HiveMetaStore::add_partitions() currently adds all partitions specified in one call using a single meta-store transaction. This acts correctly. However, there's one AddPartitionEvent created per partition specified. Ideally, the set of partitions added atomically can be communicated using a single AddPartitionEvent, such that they are consumed together. I'll post a patch that does this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails
[ https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582054#comment-13582054 ] Namit Jain commented on HIVE-4004: -- +1 Incorrect status for AddPartition metastore event if RawStore commit fails -- Key: HIVE-4004 URL: https://issues.apache.org/jira/browse/HIVE-4004 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Dilip Joseph Assignee: Dilip Joseph Priority: Minor Fix For: 0.11.0 Attachments: HIVE-4004.1.patch.txt For ADD PARTITION operations, the AddPartitionEvent does not care if the RawStore commit succeeded or not. This means that an AddPartitionEvent with status=true is fired even if the the actual ADD PARTITION operation failed. This will confuse any AddPartitionEvent listeners. Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the status of the RawStore commit. Only AddPartitionEvent has this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2843) UDAF to convert an aggregation to a map
[ https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582066#comment-13582066 ] David Worms commented on HIVE-2843: --- I just created the requested phabricator entry: https://reviews.facebook.net/T45. I did my best, arc wasnt working for me, a message like libphutil v1 libraries are no longer supported, I tried a workaround illustrated on the mailing list (http://mail-archives.apache.org/mod_mbox/hive-dev/201301.mbox/%3CFF1DF58D04F11D4291D09795D1A4EF1618657D12DB@SRV-MAIL%3E) but also without success. I ended up creating the patch and uploading it manually. UDAF to convert an aggregation to a map --- Key: HIVE-2843 URL: https://issues.apache.org/jira/browse/HIVE-2843 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.9.0, 0.10.0 Reporter: David Worms Priority: Minor Labels: features, udf Attachments: HIVE-2843.1.patch.txt I propose the addition of two new Hive UDAF to help with maps in Apache Hive. The source code is available on GitHub at https://github.com/wdavidw/hive-udf in two Java classes: UDAFToMap and UDAFToOrderedMap. The first function convert an aggregation into a map and is internally using a Java `HashMap`. The second function extends the first one. It convert an aggregation into an ordered map and is internally using a Java `TreeMap`. They both extends the `AbstractGenericUDAFResolver` class. Also, I have covered the motivations and usages of those UDAF in a blog post at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/ The full patch is available with tests as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener
[ https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3970: - Status: Open (was: Patch Available) Can you refresh ? This patch is not applying cleanly anymore. Clean up/fix PartitionNameWhitelistPreEventListener --- Key: HIVE-3970 URL: https://issues.apache.org/jira/browse/HIVE-3970 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt There are a number of issues and things which can be cleaned up related to PartitionNameWhitelistPreEventListener. * It's an event listener, but it really doesn't need to be given that the regex whitelist is configurable, it could just be a utility method. * It's not run when a partition is renamed, so partitions with invalid characters can be created in this way. * There's no easy way to check if a partition contains invalid characters before creating it and seeing if it fails. Most importantly, when a dynamic partition contains an invalid character, the directory for this partition is created, and the data is moved into it, but the partition fails to be created leaving an orphan directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3672) Support altering partition column type in Hive
[ https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582078#comment-13582078 ] Namit Jain commented on HIVE-3672: -- The patch is still not applying cleanly for me. Support altering partition column type in Hive -- Key: HIVE-3672 URL: https://issues.apache.org/jira/browse/HIVE-3672 Project: Hive Issue Type: Improvement Components: CLI, SQL Reporter: Jingwei Lu Assignee: Jingwei Lu Labels: features Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.7.patch.txt Original Estimate: 72h Remaining Estimate: 72h Currently, Hive does not allow altering partition column types. As we've discouraged users from using non-string partition column types, this presents a problem for users who want to change there partition columns to be strings, they have to rename their table, create a new table, and copy all the data over. To support this via the CLI, adding a command like ALTER TABLE table_name PARTITION COLUMN (column_name new type); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3672) Support altering partition column type in Hive
[ https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3672: - Status: Open (was: Patch Available) Support altering partition column type in Hive -- Key: HIVE-3672 URL: https://issues.apache.org/jira/browse/HIVE-3672 Project: Hive Issue Type: Improvement Components: CLI, SQL Reporter: Jingwei Lu Assignee: Jingwei Lu Labels: features Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.7.patch.txt Original Estimate: 72h Remaining Estimate: 72h Currently, Hive does not allow altering partition column types. As we've discouraged users from using non-string partition column types, this presents a problem for users who want to change there partition columns to be strings, they have to rename their table, create a new table, and copy all the data over. To support this via the CLI, adding a command like ALTER TABLE table_name PARTITION COLUMN (column_name new type); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.
[ https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582083#comment-13582083 ] Namit Jain commented on HIVE-4039: -- +1 Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause. -- Key: HIVE-4039 URL: https://issues.apache.org/jira/browse/HIVE-4039 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jean Xu Assignee: Jean Xu Priority: Minor Attachments: HIVE_4039.1.patch.txt Hive compiler fails with a NullPointerException in semantic analysis / optimisation stage when a boolean variable appears in the WHERE clause in some cases. A minimal query to generate this error is here: SELECT 1 FROM ( SELECT TRUE AS flag FROM dim_one_row:measurementsystems ) a WHERE flag; On the other hand, the following query is perfectly fine: SELECT 1 FROM ( SELECT TRUE AS flag FROM dim_one_row:measurementsystems ) a WHERE flag=TRUE; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582121#comment-13582121 ] Namit Jain commented on HIVE-3874: -- Can you fix eclipse also ? Create a new Optimized Row Columnar file format for Hive Key: HIVE-3874 URL: https://issues.apache.org/jira/browse/HIVE-3874 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, HIVE-3874.D8529.2.patch, OrcFileIntro.pptx, orc.tgz There are several limitations of the current RC File format that I'd like to address by creating a new format: * each column value is stored as a binary blob, which means: ** the entire column value must be read, decompressed, and deserialized ** the file format can't use smarter type-specific compression ** push down filters can't be evaluated * the start of each row group needs to be found by scanning * user metadata can only be added to the file when the file is created * the file doesn't store the number of rows per a file or row group * there is no mechanism for seeking to a particular row number, which is required for external indexes. * there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups. * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails
[ https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4004: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed. Thanks Dilip Incorrect status for AddPartition metastore event if RawStore commit fails -- Key: HIVE-4004 URL: https://issues.apache.org/jira/browse/HIVE-4004 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Dilip Joseph Assignee: Dilip Joseph Priority: Minor Fix For: 0.11.0 Attachments: HIVE-4004.1.patch.txt For ADD PARTITION operations, the AddPartitionEvent does not care if the RawStore commit succeeded or not. This means that an AddPartitionEvent with status=true is fired even if the the actual ADD PARTITION operation failed. This will confuse any AddPartitionEvent listeners. Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the status of the RawStore commit. Only AddPartitionEvent has this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4042) ignore mapjoin hint
[ https://issues.apache.org/jira/browse/HIVE-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4042: - Status: Patch Available (was: Open) Tests passed ignore mapjoin hint --- Key: HIVE-4042 URL: https://issues.apache.org/jira/browse/HIVE-4042 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.4042.1.patch, hive.4042.2.patch After HIVE-3784, in a production environment, it can become difficult to deploy since a lot of production queries can break. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4042) ignore mapjoin hint
[ https://issues.apache.org/jira/browse/HIVE-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4042: - Status: Patch Available (was: Open) Tests passed ignore mapjoin hint --- Key: HIVE-4042 URL: https://issues.apache.org/jira/browse/HIVE-4042 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.4042.1.patch, hive.4042.2.patch After HIVE-3784, in a production environment, it can become difficult to deploy since a lot of production queries can break. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1977 - Still Failing
Changes for Build #1975 [namit] HIVE-4021 PostgreSQL upgrade scripts are creating column with incorrect name (Jarek Jarcec Cecho via namit) [hashutosh] HIVE-4033 : NPE at runtime while selecting virtual column after joining three tables on different keys (Ashutosh Chauhan) [namit] HIVE-4029 Hive Profiler dies with NPE (Brock Noland via namit) Changes for Build #1976 [namit] HIVE-4023 Improve Error Logging in MetaStore (Bhushan Mandhani via namit) [namit] HIVE-3403 user should not specify mapjoin to perform sort-merge bucketed join (Namit Jain via Ashutosh) [namit] HIVE-4024 Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0 (Jarek Jarcec Cecho via namit) Changes for Build #1977 1 tests failed. FAILED: org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_aggregator_error_1 Error Message: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. Stack Trace: junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. at net.sf.antcontrib.logic.ForTask.doSequentialIteration(ForTask.java:259) at net.sf.antcontrib.logic.ForTask.doToken(ForTask.java:268) at net.sf.antcontrib.logic.ForTask.doTheTasks(ForTask.java:299) at net.sf.antcontrib.logic.ForTask.execute(ForTask.java:244) The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1977) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1977/ to view the results.
[jira] [Updated] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.
[ https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4039: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed. Thanks Jean Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause. -- Key: HIVE-4039 URL: https://issues.apache.org/jira/browse/HIVE-4039 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jean Xu Assignee: Jean Xu Priority: Minor Attachments: HIVE_4039.1.patch.txt Hive compiler fails with a NullPointerException in semantic analysis / optimisation stage when a boolean variable appears in the WHERE clause in some cases. A minimal query to generate this error is here: SELECT 1 FROM ( SELECT TRUE AS flag FROM dim_one_row:measurementsystems ) a WHERE flag; On the other hand, the following query is perfectly fine: SELECT 1 FROM ( SELECT TRUE AS flag FROM dim_one_row:measurementsystems ) a WHERE flag=TRUE; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4027) Thrift alter_table api doesnt validate column type
[ https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4027: - Resolution: Fixed Fix Version/s: 0.11.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed. Thanks Tim Thrift alter_table api doesnt validate column type -- Key: HIVE-4027 URL: https://issues.apache.org/jira/browse/HIVE-4027 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Fix For: 0.11.0 Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3 Thrift alter_table api doesnt validate column type so that invalid column type can sneak it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4027) Thrift alter_table api doesnt validate column type
[ https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582185#comment-13582185 ] Hudson commented on HIVE-4027: -- Integrated in hive-trunk-hadoop1 #93 (See [https://builds.apache.org/job/hive-trunk-hadoop1/93/]) HIVE-4027 Thrift alter_table api doesnt validate column type (Gang Tim Liu via namit) (Revision 1448138) Result = ABORTED namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448138 Files : * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java Thrift alter_table api doesnt validate column type -- Key: HIVE-4027 URL: https://issues.apache.org/jira/browse/HIVE-4027 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Fix For: 0.11.0 Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3 Thrift alter_table api doesnt validate column type so that invalid column type can sneak it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.
[ https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582184#comment-13582184 ] Hudson commented on HIVE-4039: -- Integrated in hive-trunk-hadoop1 #93 (See [https://builds.apache.org/job/hive-trunk-hadoop1/93/]) HIVE-4039 Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause. (Jezn Xu via namit) (Revision 1448135) Result = ABORTED namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448135 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java * /hive/trunk/ql/src/test/queries/clientpositive/test_boolean_whereclause.q * /hive/trunk/ql/src/test/results/clientpositive/test_boolean_whereclause.q.out Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause. -- Key: HIVE-4039 URL: https://issues.apache.org/jira/browse/HIVE-4039 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jean Xu Assignee: Jean Xu Priority: Minor Attachments: HIVE_4039.1.patch.txt Hive compiler fails with a NullPointerException in semantic analysis / optimisation stage when a boolean variable appears in the WHERE clause in some cases. A minimal query to generate this error is here: SELECT 1 FROM ( SELECT TRUE AS flag FROM dim_one_row:measurementsystems ) a WHERE flag; On the other hand, the following query is perfectly fine: SELECT 1 FROM ( SELECT TRUE AS flag FROM dim_one_row:measurementsystems ) a WHERE flag=TRUE; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails
[ https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582186#comment-13582186 ] Hudson commented on HIVE-4004: -- Integrated in hive-trunk-hadoop1 #93 (See [https://builds.apache.org/job/hive-trunk-hadoop1/93/]) HIVE-4004 Incorrect status for AddPartition metastore event if RawStore commit fails (Dilip Joseph via namit) (Revision 1448101) Result = ABORTED namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448101 Files : * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListenerOnlyOnCommit.java Incorrect status for AddPartition metastore event if RawStore commit fails -- Key: HIVE-4004 URL: https://issues.apache.org/jira/browse/HIVE-4004 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Dilip Joseph Assignee: Dilip Joseph Priority: Minor Fix For: 0.11.0 Attachments: HIVE-4004.1.patch.txt For ADD PARTITION operations, the AddPartitionEvent does not care if the RawStore commit succeeded or not. This means that an AddPartitionEvent with status=true is fired even if the the actual ADD PARTITION operation failed. This will confuse any AddPartitionEvent listeners. Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the status of the RawStore commit. Only AddPartitionEvent has this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4007) Create abstract classes for serializer and deserializer
[ https://issues.apache.org/jira/browse/HIVE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582260#comment-13582260 ] Jarek Jarcec Cecho commented on HIVE-4007: -- +1 (non-binding) Thank you for working on this Namit! Jarcec Create abstract classes for serializer and deserializer --- Key: HIVE-4007 URL: https://issues.apache.org/jira/browse/HIVE-4007 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.4007.1.patch, hive.4007.2.patch, hive.4007.3.patch Currently, it is very difficult to change the Serializer/Deserializer interface, since all the SerDes directly implement the interface. Instead, we should have abstract classes for implementing these interfaces. In case of a interface change, only the abstract class and the relevant serde needs to change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3980) Cleanup after HIVE-3403
[ https://issues.apache.org/jira/browse/HIVE-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582267#comment-13582267 ] Jarek Jarcec Cecho commented on HIVE-3980: -- +1 (non-binding) Seems as a reasonable changes to me. Jacec Cleanup after HIVE-3403 --- Key: HIVE-3980 URL: https://issues.apache.org/jira/browse/HIVE-3980 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3980.1.patch, hive.3980.2.patch There have been a lot of comments on HIVE-3403, which involve changing variable names/function names/adding more comments/general cleanup etc. Since HIVE-3403 involves a lot of refactoring, it was fairly difficult to address the comments there, since refreshing becomes impossible. This jira is to track those cleanups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4027) Thrift alter_table api doesnt validate column type
[ https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582270#comment-13582270 ] Gang Tim Liu commented on HIVE-4027: Namit, thank you very much. Sent from my iPhone Thrift alter_table api doesnt validate column type -- Key: HIVE-4027 URL: https://issues.apache.org/jira/browse/HIVE-4027 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Fix For: 0.11.0 Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3 Thrift alter_table api doesnt validate column type so that invalid column type can sneak it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3672) Support altering partition column type in Hive
[ https://issues.apache.org/jira/browse/HIVE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582349#comment-13582349 ] Jingwei Lu commented on HIVE-3672: -- Is there a merge conflict or unit test failure? Could you give me name of which test fails if it is the case? I run all my newly added test yesterday and they are clean. Support altering partition column type in Hive -- Key: HIVE-3672 URL: https://issues.apache.org/jira/browse/HIVE-3672 Project: Hive Issue Type: Improvement Components: CLI, SQL Reporter: Jingwei Lu Assignee: Jingwei Lu Labels: features Attachments: HIVE-3672.1.patch.txt, HIVE-3672.2.patch.txt, HIVE-3672.3.patch.txt, HIVE-3672.4.patch.txt, HIVE-3672.5.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.6.patch.txt, HIVE-3672.7.patch.txt Original Estimate: 72h Remaining Estimate: 72h Currently, Hive does not allow altering partition column types. As we've discouraged users from using non-string partition column types, this presents a problem for users who want to change there partition columns to be strings, they have to rename their table, create a new table, and copy all the data over. To support this via the CLI, adding a command like ALTER TABLE table_name PARTITION COLUMN (column_name new type); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3968) Enhance logging in TableAccessInfo
[ https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-3968: Attachment: HIVE-3968.3.patch.txt Enhance logging in TableAccessInfo -- Key: HIVE-3968 URL: https://issues.apache.org/jira/browse/HIVE-3968 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, HIVE-3968.3.patch.txt Based on what is currently available in the TableAccessInfo we can infer when it would be a good idea to add bucketing/sorting metadata for tables. However, we can't easily tell if we're already getting the benefits of bucketing/sorting. This information can be improved by a) storing the input table/partition objects so that we can tell if the tables/partitions are already bucketed/sorted b) running the TableAccessAnalyzer after the logical optimizer, so that we can tell from the operators whether or not we are already getting benefits (bucketed/sort merge map joins or map group bys) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3968) Enhance logging in TableAccessInfo
[ https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-3968: Status: Patch Available (was: Open) Enhance logging in TableAccessInfo -- Key: HIVE-3968 URL: https://issues.apache.org/jira/browse/HIVE-3968 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, HIVE-3968.3.patch.txt Based on what is currently available in the TableAccessInfo we can infer when it would be a good idea to add bucketing/sorting metadata for tables. However, we can't easily tell if we're already getting the benefits of bucketing/sorting. This information can be improved by a) storing the input table/partition objects so that we can tell if the tables/partitions are already bucketed/sorted b) running the TableAccessAnalyzer after the logical optimizer, so that we can tell from the operators whether or not we are already getting benefits (bucketed/sort merge map joins or map group bys) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3968) Enhance logging in TableAccessInfo
[ https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582372#comment-13582372 ] Kevin Wilfong commented on HIVE-3968: - Refreshed. Enhance logging in TableAccessInfo -- Key: HIVE-3968 URL: https://issues.apache.org/jira/browse/HIVE-3968 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, HIVE-3968.3.patch.txt Based on what is currently available in the TableAccessInfo we can infer when it would be a good idea to add bucketing/sorting metadata for tables. However, we can't easily tell if we're already getting the benefits of bucketing/sorting. This information can be improved by a) storing the input table/partition objects so that we can tell if the tables/partitions are already bucketed/sorted b) running the TableAccessAnalyzer after the logical optimizer, so that we can tell from the operators whether or not we are already getting benefits (bucketed/sort merge map joins or map group bys) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener
[ https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-3970: Attachment: HIVE-3970.3.patch.txt Clean up/fix PartitionNameWhitelistPreEventListener --- Key: HIVE-3970 URL: https://issues.apache.org/jira/browse/HIVE-3970 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt, HIVE-3970.3.patch.txt There are a number of issues and things which can be cleaned up related to PartitionNameWhitelistPreEventListener. * It's an event listener, but it really doesn't need to be given that the regex whitelist is configurable, it could just be a utility method. * It's not run when a partition is renamed, so partitions with invalid characters can be created in this way. * There's no easy way to check if a partition contains invalid characters before creating it and seeing if it fails. Most importantly, when a dynamic partition contains an invalid character, the directory for this partition is created, and the data is moved into it, but the partition fails to be created leaving an orphan directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener
[ https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-3970: Status: Patch Available (was: Open) Clean up/fix PartitionNameWhitelistPreEventListener --- Key: HIVE-3970 URL: https://issues.apache.org/jira/browse/HIVE-3970 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt, HIVE-3970.3.patch.txt There are a number of issues and things which can be cleaned up related to PartitionNameWhitelistPreEventListener. * It's an event listener, but it really doesn't need to be given that the regex whitelist is configurable, it could just be a utility method. * It's not run when a partition is renamed, so partitions with invalid characters can be created in this way. * There's no easy way to check if a partition contains invalid characters before creating it and seeing if it fails. Most importantly, when a dynamic partition contains an invalid character, the directory for this partition is created, and the data is moved into it, but the partition fails to be created leaving an orphan directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener
[ https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582386#comment-13582386 ] Kevin Wilfong commented on HIVE-3970: - Refreshed Clean up/fix PartitionNameWhitelistPreEventListener --- Key: HIVE-3970 URL: https://issues.apache.org/jira/browse/HIVE-3970 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt, HIVE-3970.3.patch.txt There are a number of issues and things which can be cleaned up related to PartitionNameWhitelistPreEventListener. * It's an event listener, but it really doesn't need to be given that the regex whitelist is configurable, it could just be a utility method. * It's not run when a partition is renamed, so partitions with invalid characters can be created in this way. * There's no easy way to check if a partition contains invalid characters before creating it and seeing if it fails. Most importantly, when a dynamic partition contains an invalid character, the directory for this partition is created, and the data is moved into it, but the partition fails to be created leaving an orphan directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4040) fix ptf negative tests
[ https://issues.apache.org/jira/browse/HIVE-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-4040. Resolution: Fixed Committed to branch. Thanks, Prajakta! fix ptf negative tests -- Key: HIVE-4040 URL: https://issues.apache.org/jira/browse/HIVE-4040 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Prajakta Kalmegh Priority: Minor Attachments: HIVE-4040.1.patch.txt fix queries in -ve tests to match language changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4043) Parallel Hive Queries: Sporadic Errors of form: Error in metadata: javax.jdo.JDOFatalDataStoreException: IO Error: Connection reset
Andrew Tindle created HIVE-4043: --- Summary: Parallel Hive Queries: Sporadic Errors of form: Error in metadata: javax.jdo.JDOFatalDataStoreException: IO Error: Connection reset Key: HIVE-4043 URL: https://issues.apache.org/jira/browse/HIVE-4043 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.9.0 Environment: O/S: RHEL 6.3 Metastore: Oracle 11gR2 Reporter: Andrew Tindle I have a program that spawns Hive queries/processes, up to a maximum of 5, in parallel. When the number of queries drops below. ie the process has ended, another Hive query/process is initiated. Sometimes, this program works, i.e. all 34 queries successfully process. However, on other occasions, I get sporadic instances of the following error for some of the queries: FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: IO Error: Connection reset NestedThrowables: java.sql.SQLRecoverableException: IO Error: Connection reset FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask Can anyone help in identifying/resolving why this occurs. It looks to me as if there is some kind of race condition/collision with the Hive Metastore, this being hosted in an Oracle DB on the same node as the Hadoop infrastructure (single node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4044) Add URL type
Samuel Yuan created HIVE-4044: - Summary: Add URL type Key: HIVE-4044 URL: https://issues.apache.org/jira/browse/HIVE-4044 Project: Hive Issue Type: Improvement Reporter: Samuel Yuan Assignee: Samuel Yuan Having a separate type for URLs would enable improvements in storage efficiency based on breaking up a URL into its components. The new type will be named URL and made a non-reserved keyword (see HIVE-701). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4005) Column truncation
[ https://issues.apache.org/jira/browse/HIVE-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-4005: Attachment: HIVE-4005.3.patch.txt Column truncation - Key: HIVE-4005 URL: https://issues.apache.org/jira/browse/HIVE-4005 Project: Hive Issue Type: New Feature Components: CLI Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4005.1.patch.txt, HIVE-4005.2.patch.txt, HIVE-4005.3.patch.txt Column truncation allows users to remove data for columns that are no longer useful. This is done by removing the data for the column and setting the length of the column data and related lengths to 0 in the RC file header. RC file was fixed to recognize columns with lengths of zero to be empty and are treated as if the column doesn't exist in the data, a null is returned for every value of that column in every row. This is the same thing that happens when more columns are selected than exist in the file. A new command was added to the CLI TRUNCATE TABLE ... PARTITION ... COLUMNS ... This launches a map only job where each mapper rewrites a single file without the unnecessary column data and the adjusted headers. It does not uncompress/deserialize the data so it is much faster than rewriting the data with NULLs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4005) Column truncation
[ https://issues.apache.org/jira/browse/HIVE-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582413#comment-13582413 ] Kevin Wilfong commented on HIVE-4005: - Updated Column truncation - Key: HIVE-4005 URL: https://issues.apache.org/jira/browse/HIVE-4005 Project: Hive Issue Type: New Feature Components: CLI Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4005.1.patch.txt, HIVE-4005.2.patch.txt, HIVE-4005.3.patch.txt Column truncation allows users to remove data for columns that are no longer useful. This is done by removing the data for the column and setting the length of the column data and related lengths to 0 in the RC file header. RC file was fixed to recognize columns with lengths of zero to be empty and are treated as if the column doesn't exist in the data, a null is returned for every value of that column in every row. This is the same thing that happens when more columns are selected than exist in the file. A new command was added to the CLI TRUNCATE TABLE ... PARTITION ... COLUMNS ... This launches a map only job where each mapper rewrites a single file without the unnecessary column data and the adjusted headers. It does not uncompress/deserialize the data so it is much faster than rewriting the data with NULLs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4005) Column truncation
[ https://issues.apache.org/jira/browse/HIVE-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-4005: Status: Patch Available (was: Open) Column truncation - Key: HIVE-4005 URL: https://issues.apache.org/jira/browse/HIVE-4005 Project: Hive Issue Type: New Feature Components: CLI Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4005.1.patch.txt, HIVE-4005.2.patch.txt, HIVE-4005.3.patch.txt Column truncation allows users to remove data for columns that are no longer useful. This is done by removing the data for the column and setting the length of the column data and related lengths to 0 in the RC file header. RC file was fixed to recognize columns with lengths of zero to be empty and are treated as if the column doesn't exist in the data, a null is returned for every value of that column in every row. This is the same thing that happens when more columns are selected than exist in the file. A new command was added to the CLI TRUNCATE TABLE ... PARTITION ... COLUMNS ... This launches a map only job where each mapper rewrites a single file without the unnecessary column data and the adjusted headers. It does not uncompress/deserialize the data so it is much faster than rewriting the data with NULLs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.
[ https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582423#comment-13582423 ] Hudson commented on HIVE-4039: -- Integrated in Hive-trunk-hadoop2 #130 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/130/]) HIVE-4039 Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause. (Jezn Xu via namit) (Revision 1448135) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448135 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java * /hive/trunk/ql/src/test/queries/clientpositive/test_boolean_whereclause.q * /hive/trunk/ql/src/test/results/clientpositive/test_boolean_whereclause.q.out Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause. -- Key: HIVE-4039 URL: https://issues.apache.org/jira/browse/HIVE-4039 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jean Xu Assignee: Jean Xu Priority: Minor Attachments: HIVE_4039.1.patch.txt Hive compiler fails with a NullPointerException in semantic analysis / optimisation stage when a boolean variable appears in the WHERE clause in some cases. A minimal query to generate this error is here: SELECT 1 FROM ( SELECT TRUE AS flag FROM dim_one_row:measurementsystems ) a WHERE flag; On the other hand, the following query is perfectly fine: SELECT 1 FROM ( SELECT TRUE AS flag FROM dim_one_row:measurementsystems ) a WHERE flag=TRUE; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4027) Thrift alter_table api doesnt validate column type
[ https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582424#comment-13582424 ] Hudson commented on HIVE-4027: -- Integrated in Hive-trunk-hadoop2 #130 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/130/]) HIVE-4027 Thrift alter_table api doesnt validate column type (Gang Tim Liu via namit) (Revision 1448138) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448138 Files : * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java Thrift alter_table api doesnt validate column type -- Key: HIVE-4027 URL: https://issues.apache.org/jira/browse/HIVE-4027 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Fix For: 0.11.0 Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3 Thrift alter_table api doesnt validate column type so that invalid column type can sneak it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails
[ https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582425#comment-13582425 ] Hudson commented on HIVE-4004: -- Integrated in Hive-trunk-hadoop2 #130 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/130/]) HIVE-4004 Incorrect status for AddPartition metastore event if RawStore commit fails (Dilip Joseph via namit) (Revision 1448101) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448101 Files : * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListenerOnlyOnCommit.java Incorrect status for AddPartition metastore event if RawStore commit fails -- Key: HIVE-4004 URL: https://issues.apache.org/jira/browse/HIVE-4004 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Dilip Joseph Assignee: Dilip Joseph Priority: Minor Fix For: 0.11.0 Attachments: HIVE-4004.1.patch.txt For ADD PARTITION operations, the AddPartitionEvent does not care if the RawStore commit succeeded or not. This means that an AddPartitionEvent with status=true is fired even if the the actual ADD PARTITION operation failed. This will confuse any AddPartitionEvent listeners. Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the status of the RawStore commit. Only AddPartitionEvent has this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-hadoop2 - Build # 130 - Still Failing
Changes for Build #98 Changes for Build #99 [kevinwilfong] HIVE-3940. Track columns accessed in each table in a query. (Samuel Yuan via kevinwilfong) Changes for Build #100 [namit] HIVE-3778 Add MapJoinDesc.isBucketMapJoin() as part of explain plan (Gang Tim Liu via namit) Changes for Build #101 Changes for Build #102 Changes for Build #103 Changes for Build #104 [hashutosh] HIVE-3977 : Hive 0.10 postgres schema script is broken (Johnny Zhang via Ashutosh Chauhan) [hashutosh] HIVE-3932 : Hive release tarballs don't contain PostgreSQL metastore scripts (Mark Grover via Ashutosh Chauhan) Changes for Build #105 [hashutosh] HIVE-3918 : Normalize more CRLF line endings (Mark Grover via Ashutosh Chauhan) [namit] HIVE-3917 Support noscan operation for analyze command (Gang Tim Liu via namit) Changes for Build #106 [namit] HIVE-3937 Hive Profiler (Pamela Vagata via namit) [hashutosh] HIVE-3571 : add a way to run a small unit quickly (Navis via Ashutosh Chauhan) [hashutosh] HIVE-3956 : TestMetaStoreAuthorization always uses the same port (Navis via Ashutosh Chauhan) Changes for Build #107 Changes for Build #108 Changes for Build #109 Changes for Build #110 [namit] HIVE-2839 Filters on outer join with mapjoin hint is not applied correctly (Navis via namit) Changes for Build #111 Changes for Build #112 [namit] HIVE-3998 Oracle metastore update script will fail when upgrading from 0.9.0 to 0.10.0 (Jarek and Mark via namit) [namit] HIVE-3999 Mysql metastore upgrade script will end up with different schema than the full schema load (Jarek and Mark via namit) Changes for Build #113 Changes for Build #114 [namit] HIVE-3995 PostgreSQL upgrade scripts are not valid (Jarek and Mark via namit) Changes for Build #115 Changes for Build #116 [namit] HIVE-4001 Add o.a.h.h.serde.Constants for backward compatibility (Navis via namit) Changes for Build #117 Changes for Build #118 Changes for Build #119 Changes for Build #120 [kevinwilfong] HIVE-3252. Add environment context to metastore Thrift calls. (Samuel Yuan via kevinwilfong) Changes for Build #121 Changes for Build #122 Changes for Build #123 Changes for Build #124 Changes for Build #125 Changes for Build #126 [hashutosh] HIVE-4000 Hive client goes into infinite loop at 100% cpu (Owen Omalley via Ashutosh Chauhan) Changes for Build #127 [namit] HIVE-4021 PostgreSQL upgrade scripts are creating column with incorrect name (Jarek Jarcec Cecho via namit) [hashutosh] HIVE-4033 : NPE at runtime while selecting virtual column after joining three tables on different keys (Ashutosh Chauhan) [namit] HIVE-4029 Hive Profiler dies with NPE (Brock Noland via namit) Changes for Build #128 [namit] HIVE-4023 Improve Error Logging in MetaStore (Bhushan Mandhani via namit) [namit] HIVE-3403 user should not specify mapjoin to perform sort-merge bucketed join (Namit Jain via Ashutosh) [namit] HIVE-4024 Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0 (Jarek Jarcec Cecho via namit) Changes for Build #129 Changes for Build #130 [namit] HIVE-4027 Thrift alter_table api doesnt validate column type (Gang Tim Liu via namit) [namit] HIVE-4039 Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause. (Jezn Xu via namit) [namit] HIVE-4004 Incorrect status for AddPartition metastore event if RawStore commit fails (Dilip Joseph via namit) 34 tests failed. FAILED: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_1 Error Message: Unexpected exception See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get more logs. Stack Trace: junit.framework.AssertionFailedError: Unexpected exception See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get more logs. at junit.framework.Assert.fail(Assert.java:50) at org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:5855) at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_1(TestCliDriver.java:3476) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at
[jira] [Created] (HIVE-4045) Modify PreDropPartitionEvent to pass Table parameter
Li Yang created HIVE-4045: - Summary: Modify PreDropPartitionEvent to pass Table parameter Key: HIVE-4045 URL: https://issues.apache.org/jira/browse/HIVE-4045 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Li Yang Priority: Minor MetaStorePreEventListener which implements onEvent(PreEventContext context) sometimes needs to access Table properties when PreDropPartitionEvent is listened to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #71
See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/71/ -- [...truncated 62804 lines...] [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2013-02-20 13:52:43,989 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] Execution completed successfully [junit] Mapred Local Task Succeeded . Convert the Join into MapJoin [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/71/artifact/hive/build/service/localscratchdir/hive_2013-02-20_13-52-39_182_5672754359019075109/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/71/artifact/hive/build/service/tmp/hive_job_log_jenkins_201302201352_1636206819.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt [junit] Copying file: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] Table default.testhivedrivertable stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 5812, raw_data_size: 0] [junit] POSTHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/71/artifact/hive/build/service/localscratchdir/hive_2013-02-20_13-52-47_166_6920110678730188468/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/71/artifact/hive/build/service/localscratchdir/hive_2013-02-20_13-52-47_166_6920110678730188468/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/71/artifact/hive/build/service/tmp/hive_job_log_jenkins_201302201352_119204245.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit]
Re: Review Request: HIVE-3951: Allow Decimal type columns in Regex Serde
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9173/#review16799 --- Ship it! Looks good to me (I'm not a committer). - Jarek Cecho On Jan. 31, 2013, 8:02 a.m., Mark Grover wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9173/ --- (Updated Jan. 31, 2013, 8:02 a.m.) Review request for hive. Description --- Add support for RegexSerde to support newly added Decimal type This addresses bug HVIE-3951. https://issues.apache.org/jira/browse/HVIE-3951 Diffs - ql/src/test/queries/clientpositive/serde_regex.q c3254ca ql/src/test/results/clientpositive/serde_regex.q.out a933538 serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java ae7693a Diff: https://reviews.apache.org/r/9173/diff/ Testing --- Added a client positive test Thanks, Mark Grover
[jira] [Commented] (HIVE-3951) Allow Decimal type columns in Regex Serde
[ https://issues.apache.org/jira/browse/HIVE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582587#comment-13582587 ] Jarek Jarcec Cecho commented on HIVE-3951: -- +1 (non-binding) Allow Decimal type columns in Regex Serde - Key: HIVE-3951 URL: https://issues.apache.org/jira/browse/HIVE-3951 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: Mark Grover Assignee: Mark Grover Fix For: 0.11.0 Attachments: HIVE-3951.1.patch Decimal type in Hive was recently added by HIVE-2693. We should allow users to create tables with decimal type columns when using Regex Serde. HIVE-3004 did something similar for other primitive types. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join
[ https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-3996: - Attachment: HIVE-3996_3.patch Added a test case that demonstrates the issue when combining map-joins. This is an almost exact replica of the join32.q test with the size altered but, current code would generate the same plan as join32.q when the sum of the sizes of the tables would exceed the size configured by noConditionalTask.size. Correctly enforce the memory limit on the multi-table map-join -- Key: HIVE-3996 URL: https://issues.apache.org/jira/browse/HIVE-3996 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-3996_2.patch, HIVE-3996_3.patch, HIVE-3996.patch Currently with HIVE-3784, the joins are converted to map-joins based on checks of the table size against the config variable: hive.auto.convert.join.noconditionaltask.size. However, the current implementation will also merge multiple mapjoin operators into a single task regardless of whether the sum of the table sizes will exceed the configured value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join
[ https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-3996: - Status: Patch Available (was: Open) Correctly enforce the memory limit on the multi-table map-join -- Key: HIVE-3996 URL: https://issues.apache.org/jira/browse/HIVE-3996 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-3996_2.patch, HIVE-3996_3.patch, HIVE-3996.patch Currently with HIVE-3784, the joins are converted to map-joins based on checks of the table size against the config variable: hive.auto.convert.join.noconditionaltask.size. However, the current implementation will also merge multiple mapjoin operators into a single task regardless of whether the sum of the table sizes will exceed the configured value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4046) Column masking
Samuel Yuan created HIVE-4046: - Summary: Column masking Key: HIVE-4046 URL: https://issues.apache.org/jira/browse/HIVE-4046 Project: Hive Issue Type: New Feature Components: CLI, Metastore, Query Processor Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan Sometimes data in a table needs to be kept around but made inaccessible. Right now it is possible to offline a table or a partition, but not a specific column of a partition. Also, accessing an offlined table results in an error. With this change, it will be possible to mask a column at the partition level, causing all further queries to that column to return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Requests
Would someone have a chance to take a quick look at these review requests[1][2]. [1] https://reviews.apache.org/r/9275/ [2] https://reviews.apache.org/r/9276/ Thanks, On Tue, Feb 5, 2013 at 10:00 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Thanks Mark. Appreciate that. I'll take a look. On Mon, Feb 4, 2013 at 10:23 PM, Mark Grover grover.markgro...@gmail.comwrote: Swarnim, I left some comments on reviewboard. On Mon, Feb 4, 2013 at 8:00 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Hello, I opened up two reviews for small issues, HIVE-3553[1] and HIVE-3725[2]. If you guys get a chance to review and provide feedback on it, I will really appreciate. Thanks, [1] https://reviews.apache.org/r/9275/ [2] https://reviews.apache.org/r/9276/ -- Swarnim -- Swarnim -- Swarnim
[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema
[ https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582617#comment-13582617 ] Joey Echeverria commented on HIVE-3528: --- Hey Michael, As a work around, did you try casting the null to the type of the column that you're inserting into? It's not ideal, but might be a workable interim solution. -Joey Avro SerDe doesn't handle serializing Nullable types that require access to a Schema Key: HIVE-3528 URL: https://issues.apache.org/jira/browse/HIVE-3528 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Sean Busbey Assignee: Sean Busbey Labels: avro Fix For: 0.11.0 Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt Deserialization properly handles hiding Nullable Avro types, including complex types like record, map, array, etc. However, when Serialization attempts to write out these types it erroneously makes use of the UNION schema that contains NULL and the other type. This results in Schema mis-match errors for Record, Array, Enum, Fixed, and Bytes. Here's a [review board of unit tests that express the problem|https://reviews.apache.org/r/7431/], as well as one that supports the case that it's only when the schema is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3911) udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is disabled.
[ https://issues.apache.org/jira/browse/HIVE-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HIVE-3911: --- Attachment: HIVE-3911_branch10.patch Attaching HIVE-3911_branch10.patch. This should make it consistent. I have just removed the queries that cause changes and fails this test. udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is disabled. - Key: HIVE-3911 URL: https://issues.apache.org/jira/browse/HIVE-3911 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Thiruvel Thirumoolan Fix For: 0.11.0 Attachments: HIVE-3911_branch10.patch, HIVE-3911.patch I am running Hive10 unit tests against Hadoop 0.23.5 and udaf_percentile_approx.q fails with a different value when map-side aggr is disabled and only when 3rd argument to this UDAF is 100. Matches expected output when map-side aggr is enabled for the same arguments. This test passes when hadoop.version is 1.1.1 and fails when its 0.23.x or 2.0.0-alpha or 2.0.2-alpha. [junit] 20c20 [junit] 254.083331 [junit] --- [junit] 252.77 [junit] 47c47 [junit] 254.083331 [junit] --- [junit] 252.77 [junit] 74c74 [junit] [23.358,254.083331,477.0625,489.54667] [junit] --- [junit] [24.07,252.77,476.9,487.82] [junit] 101c101 [junit] [23.358,254.083331,477.0625,489.54667] [junit] --- [junit] [24.07,252.77,476.9,487.82] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3911) udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is disabled.
[ https://issues.apache.org/jira/browse/HIVE-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HIVE-3911: --- Fix Version/s: 0.10.1 Assignee: Thiruvel Thirumoolan udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is disabled. - Key: HIVE-3911 URL: https://issues.apache.org/jira/browse/HIVE-3911 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 0.11.0, 0.10.1 Attachments: HIVE-3911_branch10.patch, HIVE-3911.patch I am running Hive10 unit tests against Hadoop 0.23.5 and udaf_percentile_approx.q fails with a different value when map-side aggr is disabled and only when 3rd argument to this UDAF is 100. Matches expected output when map-side aggr is enabled for the same arguments. This test passes when hadoop.version is 1.1.1 and fails when its 0.23.x or 2.0.0-alpha or 2.0.2-alpha. [junit] 20c20 [junit] 254.083331 [junit] --- [junit] 252.77 [junit] 47c47 [junit] 254.083331 [junit] --- [junit] 252.77 [junit] 74c74 [junit] [23.358,254.083331,477.0625,489.54667] [junit] --- [junit] [24.07,252.77,476.9,487.82] [junit] 101c101 [junit] [23.358,254.083331,477.0625,489.54667] [junit] --- [junit] [24.07,252.77,476.9,487.82] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3741) Driver.validateConfVariables() should perform more validations
[ https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582637#comment-13582637 ] Gang Tim Liu commented on HIVE-3741: https://reviews.facebook.net/D8715 Driver.validateConfVariables() should perform more validations -- Key: HIVE-3741 URL: https://issues.apache.org/jira/browse/HIVE-3741 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3741) Driver.validateConfVariables() should perform more validations
[ https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3741: --- Attachment: HIVE-3741.patch.1 Driver.validateConfVariables() should perform more validations -- Key: HIVE-3741 URL: https://issues.apache.org/jira/browse/HIVE-3741 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3741.patch.1 Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-3741) Driver.validateConfVariables() should perform more validations
[ https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-3741 started by Gang Tim Liu. Driver.validateConfVariables() should perform more validations -- Key: HIVE-3741 URL: https://issues.apache.org/jira/browse/HIVE-3741 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3741.patch.1 Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3741) Driver.validateConfVariables() should perform more validations
[ https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3741: --- Status: Patch Available (was: In Progress) patch is available for review. Driver.validateConfVariables() should perform more validations -- Key: HIVE-3741 URL: https://issues.apache.org/jira/browse/HIVE-3741 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3741.patch.1 Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4046) Column masking
[ https://issues.apache.org/jira/browse/HIVE-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-4046: - Component/s: Security Authorization Column masking -- Key: HIVE-4046 URL: https://issues.apache.org/jira/browse/HIVE-4046 Project: Hive Issue Type: New Feature Components: Authorization, CLI, Metastore, Query Processor, Security Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan Sometimes data in a table needs to be kept around but made inaccessible. Right now it is possible to offline a table or a partition, but not a specific column of a partition. Also, accessing an offlined table results in an error. With this change, it will be possible to mask a column at the partition level, causing all further queries to that column to return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4046) Column masking
[ https://issues.apache.org/jira/browse/HIVE-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582648#comment-13582648 ] Carl Steinbach commented on HIVE-4046: -- I think it's possible to accomplish most of this functionality using views in combination with authorization. I'm also concerned that with the proposed behavior users will have trouble differentiating between the case where they aren't allowed to read a column and the other case where they do have permission to read the column, but all of the values are actually NULL. Column masking -- Key: HIVE-4046 URL: https://issues.apache.org/jira/browse/HIVE-4046 Project: Hive Issue Type: New Feature Components: Authorization, CLI, Metastore, Query Processor, Security Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan Sometimes data in a table needs to be kept around but made inaccessible. Right now it is possible to offline a table or a partition, but not a specific column of a partition. Also, accessing an offlined table results in an error. With this change, it will be possible to mask a column at the partition level, causing all further queries to that column to return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3720) Expand and standardize authorization in Hive
[ https://issues.apache.org/jira/browse/HIVE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-3720: - Component/s: Security Expand and standardize authorization in Hive Key: HIVE-3720 URL: https://issues.apache.org/jira/browse/HIVE-3720 Project: Hive Issue Type: Improvement Components: Authorization, Security Affects Versions: 0.9.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: Hive_Authorization_Functionality.pdf The existing implementation of authorization in Hive is not complete. Additionally the existing implementation has security holes. This JIRA is an umbrella JIRA for a) extending authorization to all SQL operations and direct metadata operations, and b) standardizing the authorization model and its semantics to mirror that of MySQL as closely as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4022) Structs and struct fields cannot be NULL in INSERT statements
[ https://issues.apache.org/jira/browse/HIVE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582662#comment-13582662 ] Michael Malak commented on HIVE-4022: - Note that there is a workaround for the case of setting STRUCT fields to NULL, but not for setting the whole STRUCT to a NULL. The following workaround does work: INSERT INT TABLE oc SELECT named_struct('a', cast(null as int), 'b', cast(null as int)) FROM tc; But there is no equivalent workaround to casting the whole STRUCT to NULL, as noted in the first comment of https://issues.apache.org/jira/browse/HIVE-1287 Structs and struct fields cannot be NULL in INSERT statements - Key: HIVE-4022 URL: https://issues.apache.org/jira/browse/HIVE-4022 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Michael Malak Originally thought to be Avro-specific, and first noted with respect to HIVE-3528 Avro SerDe doesn't handle serializing Nullable types that require access to a Schema, it turns out even native Hive tables cannot store NULL in a STRUCT field or for the entire STRUCT itself, at least when the NULL is specified directly in the INSERT statement. Again, this affects both Avro-backed tables and native Hive tables. ***For native Hive tables: The following: echo 1,2 twovalues.csv hive CREATE TABLE tc (x INT, y INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; LOAD DATA LOCAL INPATH 'twovalues.csv' INTO TABLE tc; CREATE TABLE oc (z STRUCTa: int, b: int); INSERT INTO TABLE oc SELECT null FROM tc; produces the error FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from void to structa:int,b:int. The following: INSERT INTO TABLE oc SELECT named_struct('a', null, 'b', null) FROM tc; produces the error: FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from structa:void,b:void to structa:int,b:int. ***For Avro: In HIVE-3528, there is in fact a null-struct test case in line 14 of https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt The test script at https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q does indeed work. But in that test, the query gets all of its data from a test table verbatim: INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer; If instead we stick in a hard-coded null for the struct directly into the query, it fails: INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, bytes1, fixed1 FROM test_serializer; with the following error: FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different 'as_avro': Cannot convert column 10 from void to structsint:int,sboolean:boolean,sstring:string. Note, though, that substituting a hard-coded null for string1 (and restoring struct1 into the query) does work: INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, fixed1 FROM test_serializer; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3528) Avro SerDe doesn't handle serializing Nullable types that require access to a Schema
[ https://issues.apache.org/jira/browse/HIVE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582664#comment-13582664 ] Michael Malak commented on HIVE-3528: - As noted in the first comment from https://issues.apache.org/jira/browse/HIVE-1287, casting to a STRUCT is not currently supported. However, I did just now try casting individual fields of a STRUCT and that indeed does work. I just now added details to the JIRA that I created last week. https://issues.apache.org/jira/browse/HIVE-4022 Avro SerDe doesn't handle serializing Nullable types that require access to a Schema Key: HIVE-3528 URL: https://issues.apache.org/jira/browse/HIVE-3528 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Sean Busbey Assignee: Sean Busbey Labels: avro Fix For: 0.11.0 Attachments: HIVE-3528.1.patch.txt, HIVE-3528.2.patch.txt Deserialization properly handles hiding Nullable Avro types, including complex types like record, map, array, etc. However, when Serialization attempts to write out these types it erroneously makes use of the UNION schema that contains NULL and the other type. This results in Schema mis-match errors for Record, Array, Enum, Fixed, and Bytes. Here's a [review board of unit tests that express the problem|https://reviews.apache.org/r/7431/], as well as one that supports the case that it's only when the schema is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4039) Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause.
[ https://issues.apache.org/jira/browse/HIVE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582676#comment-13582676 ] Hudson commented on HIVE-4039: -- Integrated in Hive-trunk-h0.21 #1978 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1978/]) HIVE-4039 Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause. (Jezn Xu via namit) (Revision 1448135) Result = SUCCESS namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448135 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java * /hive/trunk/ql/src/test/queries/clientpositive/test_boolean_whereclause.q * /hive/trunk/ql/src/test/results/clientpositive/test_boolean_whereclause.q.out Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause. -- Key: HIVE-4039 URL: https://issues.apache.org/jira/browse/HIVE-4039 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jean Xu Assignee: Jean Xu Priority: Minor Attachments: HIVE_4039.1.patch.txt Hive compiler fails with a NullPointerException in semantic analysis / optimisation stage when a boolean variable appears in the WHERE clause in some cases. A minimal query to generate this error is here: SELECT 1 FROM ( SELECT TRUE AS flag FROM dim_one_row:measurementsystems ) a WHERE flag; On the other hand, the following query is perfectly fine: SELECT 1 FROM ( SELECT TRUE AS flag FROM dim_one_row:measurementsystems ) a WHERE flag=TRUE; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4027) Thrift alter_table api doesnt validate column type
[ https://issues.apache.org/jira/browse/HIVE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582677#comment-13582677 ] Hudson commented on HIVE-4027: -- Integrated in Hive-trunk-h0.21 #1978 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1978/]) HIVE-4027 Thrift alter_table api doesnt validate column type (Gang Tim Liu via namit) (Revision 1448138) Result = SUCCESS namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448138 Files : * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java Thrift alter_table api doesnt validate column type -- Key: HIVE-4027 URL: https://issues.apache.org/jira/browse/HIVE-4027 Project: Hive Issue Type: Bug Reporter: Gang Tim Liu Assignee: Gang Tim Liu Fix For: 0.11.0 Attachments: HIVE-4027.patch.1, HIVE-4027.patch.2, HIVE-4027.patch.3 Thrift alter_table api doesnt validate column type so that invalid column type can sneak it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4004) Incorrect status for AddPartition metastore event if RawStore commit fails
[ https://issues.apache.org/jira/browse/HIVE-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582678#comment-13582678 ] Hudson commented on HIVE-4004: -- Integrated in Hive-trunk-h0.21 #1978 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1978/]) HIVE-4004 Incorrect status for AddPartition metastore event if RawStore commit fails (Dilip Joseph via namit) (Revision 1448101) Result = SUCCESS namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448101 Files : * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyListener.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java * /hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestMetaStoreEventListenerOnlyOnCommit.java Incorrect status for AddPartition metastore event if RawStore commit fails -- Key: HIVE-4004 URL: https://issues.apache.org/jira/browse/HIVE-4004 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Dilip Joseph Assignee: Dilip Joseph Priority: Minor Fix For: 0.11.0 Attachments: HIVE-4004.1.patch.txt For ADD PARTITION operations, the AddPartitionEvent does not care if the RawStore commit succeeded or not. This means that an AddPartitionEvent with status=true is fired even if the the actual ADD PARTITION operation failed. This will confuse any AddPartitionEvent listeners. Other MetastoreListenerEvents like CreateTableEvent correctly incorporate the status of the RawStore commit. Only AddPartitionEvent has this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1978 - Fixed
Changes for Build #1975 [namit] HIVE-4021 PostgreSQL upgrade scripts are creating column with incorrect name (Jarek Jarcec Cecho via namit) [hashutosh] HIVE-4033 : NPE at runtime while selecting virtual column after joining three tables on different keys (Ashutosh Chauhan) [namit] HIVE-4029 Hive Profiler dies with NPE (Brock Noland via namit) Changes for Build #1976 [namit] HIVE-4023 Improve Error Logging in MetaStore (Bhushan Mandhani via namit) [namit] HIVE-3403 user should not specify mapjoin to perform sort-merge bucketed join (Namit Jain via Ashutosh) [namit] HIVE-4024 Derby metastore update script will fail when upgrading from 0.9.0 to 0.10.0 (Jarek Jarcec Cecho via namit) Changes for Build #1977 Changes for Build #1978 [namit] HIVE-4027 Thrift alter_table api doesnt validate column type (Gang Tim Liu via namit) [namit] HIVE-4039 Hive compiler sometimes fails in semantic analysis / optimisation stage when boolean variable appears in WHERE clause. (Jezn Xu via namit) [namit] HIVE-4004 Incorrect status for AddPartition metastore event if RawStore commit fails (Dilip Joseph via namit) All tests passed The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1978) Status: Fixed Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1978/ to view the results.
[jira] [Work started] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-3710 started by Gang Tim Liu. HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator -- Key: HIVE-3710 URL: https://issues.apache.org/jira/browse/HIVE-3710 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu It should be part of the plan instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3710: --- Attachment: HIVE-3710.patch.1 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator -- Key: HIVE-3710 URL: https://issues.apache.org/jira/browse/HIVE-3710 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3710.patch.1 It should be part of the plan instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582706#comment-13582706 ] Gang Tim Liu commented on HIVE-3710: https://reviews.facebook.net/D8721 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator -- Key: HIVE-3710 URL: https://issues.apache.org/jira/browse/HIVE-3710 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3710.patch.1 It should be part of the plan instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3710: --- Status: Patch Available (was: In Progress) patch is available. HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator -- Key: HIVE-3710 URL: https://issues.apache.org/jira/browse/HIVE-3710 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3710.patch.1 It should be part of the plan instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3992) Hive RCFile::sync(long) does a sub-sequence linear search for sync blocks
[ https://issues.apache.org/jira/browse/HIVE-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-3992: -- Assignee: Gopal V Release Note: Rely on previous sync-points when syncing within the same RCFile and avoid unnecessary I/O Status: Patch Available (was: Open) Patch optimizes for rcfile splits when they are being merged in a CombineFileSplit instance. Hive RCFile::sync(long) does a sub-sequence linear search for sync blocks - Key: HIVE-3992 URL: https://issues.apache.org/jira/browse/HIVE-3992 Project: Hive Issue Type: Bug Environment: Ubuntu x86_64/java-1.6/hadoop-2.0.3 Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-3992.patch, select-join-limit.html The following function does some bad I/O {code} public synchronized void sync(long position) throws IOException { ... try { seek(position + 4); // skip escape in.readFully(syncCheck); int syncLen = sync.length; for (int i = 0; in.getPos() end; i++) { int j = 0; for (; j syncLen; j++) { if (sync[j] != syncCheck[(i + j) % syncLen]) { break; } } if (j == syncLen) { in.seek(in.getPos() - SYNC_SIZE); // position before // sync return; } syncCheck[i % syncLen] = in.readByte(); } } ... } {code} This causes a rather large number of readByte() calls which are passed onto a ByteBuffer via a single byte array. This results in rather a large amount of CPU being burnt in a the linear search for the sync pattern in the input RCFile (upto 92% for a skewed example - a trivial map-join + limit 100). This behaviour should be avoided at best or at least replaced by a rolling hash for efficient comparison, since it has a known byte-width of 16 bytes. Attached the stack trace from a Yourkit profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4046) Column masking
[ https://issues.apache.org/jira/browse/HIVE-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582759#comment-13582759 ] Justin Boseant commented on HIVE-4046: -- The problem with using authorization is that querying one of these columns is going to result in an error / failed query. The requested functionality requires that we succeed the query and mask the data. Column masking -- Key: HIVE-4046 URL: https://issues.apache.org/jira/browse/HIVE-4046 Project: Hive Issue Type: New Feature Components: Authorization, CLI, Metastore, Query Processor, Security Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan Sometimes data in a table needs to be kept around but made inaccessible. Right now it is possible to offline a table or a partition, but not a specific column of a partition. Also, accessing an offlined table results in an error. With this change, it will be possible to mask a column at the partition level, causing all further queries to that column to return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4046) Column masking
[ https://issues.apache.org/jira/browse/HIVE-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582772#comment-13582772 ] Carl Steinbach commented on HIVE-4046: -- Here's what I meant: {code} CREATE TABLE emp ( name STRING, title STRING, salary INT ); CREATE VIEW emp_masked AS SELECT name, title, NULL FROM emp; {code} Then use authorization to restrict access to the underlying emp table. Regardless of which approach is used, I think it would be good to write up a proposal explaining the functional and implementation details before writing any code. Column masking -- Key: HIVE-4046 URL: https://issues.apache.org/jira/browse/HIVE-4046 Project: Hive Issue Type: New Feature Components: Authorization, CLI, Metastore, Query Processor, Security Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan Sometimes data in a table needs to be kept around but made inaccessible. Right now it is possible to offline a table or a partition, but not a specific column of a partition. Also, accessing an offlined table results in an error. With this change, it will be possible to mask a column at the partition level, causing all further queries to that column to return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-3710: - Status: Open (was: Patch Available) bq. It should be part of the plan instead. Why should it be part of the plan? Is this patch intended to resolve incorrect behavior, or is it a performance optimization, or ...? Please add a test case. HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator -- Key: HIVE-3710 URL: https://issues.apache.org/jira/browse/HIVE-3710 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3710.patch.1 It should be part of the plan instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582797#comment-13582797 ] Gang Tim Liu commented on HIVE-3710: It's follow up on HIVE-3706. should follow into performance optimization although not as big as HIVE-3706. HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator -- Key: HIVE-3710 URL: https://issues.apache.org/jira/browse/HIVE-3710 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3710.patch.1 It should be part of the plan instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582804#comment-13582804 ] Gang Tim Liu commented on HIVE-3710: It's not new feature but moving code from run-time path to compilation path in order to improve performance. Thought existing statistics-related test cases have good coverage already. Please let me know your thoughts and if it makes sense. I will act accordingly. thanks a lot HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator -- Key: HIVE-3710 URL: https://issues.apache.org/jira/browse/HIVE-3710 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3710.patch.1 It should be part of the plan instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582809#comment-13582809 ] Gang Tim Liu commented on HIVE-3710: For example, stats0.q ... stats18.q are existing stats-related test cases. thanks a lot HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator -- Key: HIVE-3710 URL: https://issues.apache.org/jira/browse/HIVE-3710 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3710.patch.1 It should be part of the plan instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4002) Fetch task aggregation for simple group by query
[ https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4002: -- Attachment: HIVE-4002.D8739.1.patch navis requested code review of HIVE-4002 [jira] Fetch task aggregation for simple group by query. Reviewers: JIRA HIVE-4002 Fetch task aggregation for simple group by query Aggregation queries with no group-by clause (for example, select count from src) executes final aggregation in single reduce task. But it's too small even for single reducer because the most of UDAF generates just single row for map aggregation. If final fetch task can aggregate outputs from map tasks, shuffling time can be removed. This optimization transforms operator tree something like, TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK into TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS) With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 min, before). TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D8739 AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchAggregation.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java ql/src/test/queries/clientpositive/fetch_aggregation.q ql/src/test/results/clientpositive/fetch_aggregation.q.out ql/src/test/results/compiler/plan/groupby1.q.xml ql/src/test/results/compiler/plan/groupby2.q.xml ql/src/test/results/compiler/plan/groupby3.q.xml ql/src/test/results/compiler/plan/groupby5.q.xml serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/21291/ To: JIRA, navis Fetch task aggregation for simple group by query Key: HIVE-4002 URL: https://issues.apache.org/jira/browse/HIVE-4002 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4002.D8739.1.patch Aggregation queries with no group-by clause (for example, select count(*) from src) executes final aggregation in single reduce task. But it's too small even for single reducer because the most of UDAF generates just single row for map aggregation. If final fetch task can aggregate outputs from map tasks, shuffling time can be removed. This optimization transforms operator tree something like, TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK into TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS) With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 min, before). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4002) Fetch task aggregation for simple group by query
[ https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-4002: Status: Patch Available (was: Open) Fetch task aggregation for simple group by query Key: HIVE-4002 URL: https://issues.apache.org/jira/browse/HIVE-4002 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4002.D8739.1.patch Aggregation queries with no group-by clause (for example, select count(*) from src) executes final aggregation in single reduce task. But it's too small even for single reducer because the most of UDAF generates just single row for map aggregation. If final fetch task can aggregate outputs from map tasks, shuffling time can be removed. This optimization transforms operator tree something like, TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK into TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS) With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 min, before). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-948) more query plan optimization rules
[ https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582880#comment-13582880 ] Navis commented on HIVE-948: Ah, sorry. I'l update that. bq. Why this needs to be last optimizer? It's not updating infos for the SEL including colExprMap, etc. Following optimizers like GlobalLimitOptimizer or SimpleFetchOptimizer does not modify operator tree. (Possibly update infos, but I was even thinking of removing all of them as a CleanupProcessor, making the plan file smaller) bq. Also, parent should always have child's schema, isnt it? I thought SEL(no-compute) does not have schema because it just inherits that of parent. I'll check it again. bq. Shouldn't parent be selectStar either when child is select-star or parent itself is select-star. I've escaped those situations before applying it like this (in the missing file), cause I'm not sure of it. {code} if (pSEL.getConf().isSelStarNoCompute()) { // SEL(no-compute)-SEL. never seen this condition, and removing parent is not safe in current graph walker return null; } {code} more query plan optimization rules --- Key: HIVE-948 URL: https://issues.apache.org/jira/browse/HIVE-948 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Navis Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch Many query plans are not optimal in that they contain redundant operators. Some examples are unnecessary select operators (select followed by select, select output being the same as input etc.). Even though these operators are not very expensive, they could account for around 10% of CPU time in some simple queries. It seems they are low-hanging fruits that we should pick first. BTW, it seems these optimization rules should be added at the last stage of the physical optimization phase since some redundant operators are added to facilitate physical plan generation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-948) more query plan optimization rules
[ https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-948: - Attachment: HIVE-948.D8463.4.patch navis updated the revision HIVE-948 [jira] more query plan optimization rules. Added missing class, sorry Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D8463 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8463?vs=27807id=28257#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java To: JIRA, ashutoshc, navis more query plan optimization rules --- Key: HIVE-948 URL: https://issues.apache.org/jira/browse/HIVE-948 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Navis Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.4.patch Many query plans are not optimal in that they contain redundant operators. Some examples are unnecessary select operators (select followed by select, select output being the same as input etc.). Even though these operators are not very expensive, they could account for around 10% of CPU time in some simple queries. It seems they are low-hanging fruits that we should pick first. BTW, it seems these optimization rules should be added at the last stage of the physical optimization phase since some redundant operators are added to facilitate physical plan generation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4025) Add reflect UDF for member method invocation of column
[ https://issues.apache.org/jira/browse/HIVE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582886#comment-13582886 ] Phabricator commented on HIVE-4025: --- navis has commented on the revision HIVE-4025 [jira] Add reflect UDF for member method invocation of column. INLINE COMMENTS ql/src/test/results/clientpositive/udf_reflect2.q.out:312 I'll update that. bq. The last columns seem to be wrong: It's right result for TimeStamp class. getYear() * Returns a value that is the result of subtracting 1900 from the * year that contains or begins with the instant in time represented * by this codeDate/code object, as interpreted in the local * time zone. getMonth() * Returns a number representing the month that contains or begins * with the instant in time represented by this ttDate/tt object. * The value returned is between code0/code and code11/code, * with the value code0/code representing January. getDay() * Returns the day of the week represented by this date. The * returned value (tt0/tt = Sunday, tt1/tt = Monday, * tt2/tt = Tuesday, tt3/tt = Wednesday, tt4/tt = * Thursday, tt5/tt = Friday, tt6/tt = Saturday) * represents the day of the week that contains or begins with * the instant in time represented by this ttDate/tt object, * as interpreted in the local time zone. REVISION DETAIL https://reviews.facebook.net/D8601 To: JIRA, navis Cc: njain, brock Add reflect UDF for member method invocation of column -- Key: HIVE-4025 URL: https://issues.apache.org/jira/browse/HIVE-4025 Project: Hive Issue Type: Improvement Components: UDF Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-4025.D8601.1.patch There are many useful non-static methods on type of primitive types. But current reflect UDF cannot invoke those. For example, select reflect2(value, replace, val, VALUE) from src; which replaces 'val' part of value column with 'VALUE' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4045) Modify PreDropPartitionEvent to pass Table parameter
[ https://issues.apache.org/jira/browse/HIVE-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain reassigned HIVE-4045: Assignee: Li Yang Modify PreDropPartitionEvent to pass Table parameter Key: HIVE-4045 URL: https://issues.apache.org/jira/browse/HIVE-4045 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Li Yang Assignee: Li Yang Priority: Minor MetaStorePreEventListener which implements onEvent(PreEventContext context) sometimes needs to access Table properties when PreDropPartitionEvent is listened to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3741) Driver.validateConfVariables() should perform more validations
[ https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582897#comment-13582897 ] Namit Jain commented on HIVE-3741: -- +1 Driver.validateConfVariables() should perform more validations -- Key: HIVE-3741 URL: https://issues.apache.org/jira/browse/HIVE-3741 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3741.patch.1 Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4016) Remove init(fname) from TestParse.vm for each test
[ https://issues.apache.org/jira/browse/HIVE-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4016: -- Attachment: HIVE-4016.D8547.2.patch navis updated the revision HIVE-4016 [jira] Remove init(fname) from TestParse.vm for each test. Addressed commnets (removed dummy incrementors and updated result plans) Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D8547 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8547?vs=27657id=28263#toc AFFECTED FILES ql/src/test/results/compiler/plan/case_sensitivity.q.xml ql/src/test/results/compiler/plan/cast1.q.xml ql/src/test/results/compiler/plan/groupby1.q.xml ql/src/test/results/compiler/plan/groupby2.q.xml ql/src/test/results/compiler/plan/groupby3.q.xml ql/src/test/results/compiler/plan/groupby4.q.xml ql/src/test/results/compiler/plan/groupby5.q.xml ql/src/test/results/compiler/plan/groupby6.q.xml ql/src/test/results/compiler/plan/input1.q.xml ql/src/test/results/compiler/plan/input2.q.xml ql/src/test/results/compiler/plan/input20.q.xml ql/src/test/results/compiler/plan/input3.q.xml ql/src/test/results/compiler/plan/input4.q.xml ql/src/test/results/compiler/plan/input5.q.xml ql/src/test/results/compiler/plan/input6.q.xml ql/src/test/results/compiler/plan/input7.q.xml ql/src/test/results/compiler/plan/input8.q.xml ql/src/test/results/compiler/plan/input9.q.xml ql/src/test/results/compiler/plan/input_part1.q.xml ql/src/test/results/compiler/plan/input_testsequencefile.q.xml ql/src/test/results/compiler/plan/input_testxpath.q.xml ql/src/test/results/compiler/plan/input_testxpath2.q.xml ql/src/test/results/compiler/plan/join1.q.xml ql/src/test/results/compiler/plan/join2.q.xml ql/src/test/results/compiler/plan/join3.q.xml ql/src/test/results/compiler/plan/join4.q.xml ql/src/test/results/compiler/plan/join5.q.xml ql/src/test/results/compiler/plan/join6.q.xml ql/src/test/results/compiler/plan/join7.q.xml ql/src/test/results/compiler/plan/join8.q.xml ql/src/test/results/compiler/plan/sample1.q.xml ql/src/test/results/compiler/plan/sample2.q.xml ql/src/test/results/compiler/plan/sample3.q.xml ql/src/test/results/compiler/plan/sample4.q.xml ql/src/test/results/compiler/plan/sample5.q.xml ql/src/test/results/compiler/plan/sample6.q.xml ql/src/test/results/compiler/plan/sample7.q.xml ql/src/test/results/compiler/plan/subq.q.xml ql/src/test/results/compiler/plan/udf1.q.xml ql/src/test/results/compiler/plan/udf4.q.xml ql/src/test/results/compiler/plan/udf6.q.xml ql/src/test/results/compiler/plan/udf_case.q.xml ql/src/test/results/compiler/plan/udf_when.q.xml ql/src/test/results/compiler/plan/union.q.xml ql/src/test/templates/TestParse.vm To: JIRA, ashutoshc, navis Remove init(fname) from TestParse.vm for each test -- Key: HIVE-4016 URL: https://issues.apache.org/jira/browse/HIVE-4016 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-4016.D8547.1.patch, HIVE-4016.D8547.2.patch TestParse does not change any of configuration or data, which means calling init() method before each test is not necessary. After removing it, test time reduced to 260sec to 16sec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2843) UDAF to convert an aggregation to a map
[ https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-2843: -- Attachment: HIVE-2843.D8745.1.patch navis requested code review of HIVE-2843 [jira] UDAF to convert an aggregation to a map. Reviewers: JIRA HIVE-2843 UDAF to convert an aggregation to a map I propose the addition of two new Hive UDAF to help with maps in Apache Hive. The source code is available on GitHub at https://github.com/wdavidw/hive-udf in two Java classes: UDAFToMap and UDAFToOrderedMap. The first function convert an aggregation into a map and is internally using a Java `HashMap`. The second function extends the first one. It convert an aggregation into an ordered map and is internally using a Java `TreeMap`. They both extends the `AbstractGenericUDAFResolver` class. Also, I have covered the motivations and usages of those UDAF in a blog post at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/ The full patch is available with tests as well. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D8745 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFImplodeToMap.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFImplodeToOrderedMap.java ql/src/test/queries/clientpositive/implode_to_map.q ql/src/test/queries/clientpositive/implode_to_ordered_map.q ql/src/test/results/clientpositive/implode_to_map.q.out ql/src/test/results/clientpositive/implode_to_ordered_map.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/21309/ To: JIRA, navis UDAF to convert an aggregation to a map --- Key: HIVE-2843 URL: https://issues.apache.org/jira/browse/HIVE-2843 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.9.0, 0.10.0 Reporter: David Worms Priority: Minor Labels: features, udf Attachments: HIVE-2843.1.patch.txt, HIVE-2843.D8745.1.patch I propose the addition of two new Hive UDAF to help with maps in Apache Hive. The source code is available on GitHub at https://github.com/wdavidw/hive-udf in two Java classes: UDAFToMap and UDAFToOrderedMap. The first function convert an aggregation into a map and is internally using a Java `HashMap`. The second function extends the first one. It convert an aggregation into an ordered map and is internally using a Java `TreeMap`. They both extends the `AbstractGenericUDAFResolver` class. Also, I have covered the motivations and usages of those UDAF in a blog post at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/ The full patch is available with tests as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2843) UDAF to convert an aggregation to a map
[ https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582924#comment-13582924 ] Navis commented on HIVE-2843: - Made phabricator entry for quick review. I've used similar UDAF for implementing pivot feature and it was very useful. UDAF to convert an aggregation to a map --- Key: HIVE-2843 URL: https://issues.apache.org/jira/browse/HIVE-2843 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.9.0, 0.10.0 Reporter: David Worms Priority: Minor Labels: features, udf Attachments: HIVE-2843.1.patch.txt, HIVE-2843.D8745.1.patch I propose the addition of two new Hive UDAF to help with maps in Apache Hive. The source code is available on GitHub at https://github.com/wdavidw/hive-udf in two Java classes: UDAFToMap and UDAFToOrderedMap. The first function convert an aggregation into a map and is internally using a Java `HashMap`. The second function extends the first one. It convert an aggregation into an ordered map and is internally using a Java `TreeMap`. They both extends the `AbstractGenericUDAFResolver` class. Also, I have covered the motivations and usages of those UDAF in a blog post at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/ The full patch is available with tests as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3968) Enhance logging in TableAccessInfo
[ https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582929#comment-13582929 ] Namit Jain commented on HIVE-3968: -- +1 Enhance logging in TableAccessInfo -- Key: HIVE-3968 URL: https://issues.apache.org/jira/browse/HIVE-3968 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, HIVE-3968.3.patch.txt Based on what is currently available in the TableAccessInfo we can infer when it would be a good idea to add bucketing/sorting metadata for tables. However, we can't easily tell if we're already getting the benefits of bucketing/sorting. This information can be improved by a) storing the input table/partition objects so that we can tell if the tables/partitions are already bucketed/sorted b) running the TableAccessAnalyzer after the logical optimizer, so that we can tell from the operators whether or not we are already getting benefits (bucketed/sort merge map joins or map group bys) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join
[ https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3996: - Status: Open (was: Patch Available) comments Correctly enforce the memory limit on the multi-table map-join -- Key: HIVE-3996 URL: https://issues.apache.org/jira/browse/HIVE-3996 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-3996_2.patch, HIVE-3996_3.patch, HIVE-3996.patch Currently with HIVE-3784, the joins are converted to map-joins based on checks of the table size against the config variable: hive.auto.convert.join.noconditionaltask.size. However, the current implementation will also merge multiple mapjoin operators into a single task regardless of whether the sum of the table sizes will exceed the configured value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3970) Clean up/fix PartitionNameWhitelistPreEventListener
[ https://issues.apache.org/jira/browse/HIVE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582934#comment-13582934 ] Namit Jain commented on HIVE-3970: -- +1 Clean up/fix PartitionNameWhitelistPreEventListener --- Key: HIVE-3970 URL: https://issues.apache.org/jira/browse/HIVE-3970 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3970.1.patch.txt, HIVE-3970.2.patch.txt, HIVE-3970.3.patch.txt There are a number of issues and things which can be cleaned up related to PartitionNameWhitelistPreEventListener. * It's an event listener, but it really doesn't need to be given that the regex whitelist is configurable, it could just be a utility method. * It's not run when a partition is renamed, so partitions with invalid characters can be created in this way. * There's no easy way to check if a partition contains invalid characters before creating it and seeing if it fails. Most importantly, when a dynamic partition contains an invalid character, the directory for this partition is created, and the data is moved into it, but the partition fails to be created leaving an orphan directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3710: --- Attachment: HIVE-3710.patch.2 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator -- Key: HIVE-3710 URL: https://issues.apache.org/jira/browse/HIVE-3710 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3710.patch.1, HIVE-3710.patch.2 It should be part of the plan instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2843) UDAF to convert an aggregation to a map
[ https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582947#comment-13582947 ] Phabricator commented on HIVE-2843: --- njain has commented on the revision HIVE-2843 [jira] UDAF to convert an aggregation to a map. INLINE COMMENTS ql/src/test/queries/clientpositive/implode_to_map.q:2 The code changes look good. Some minor comments: 1. Can you add describe implode_to_map and desc extended in the test ? 2. Have you run all the tests ? I think you need to update show_functions.q.out ql/src/test/queries/clientpositive/implode_to_map.q:24 can you add some comments here - what is the implode_to_map returning ? Add a test where the 2nd arg to implode_to_map is a primitive type ql/src/test/queries/clientpositive/implode_to_ordered_map.q:25 same as above. REVISION DETAIL https://reviews.facebook.net/D8745 To: JIRA, navis Cc: njain UDAF to convert an aggregation to a map --- Key: HIVE-2843 URL: https://issues.apache.org/jira/browse/HIVE-2843 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.9.0, 0.10.0 Reporter: David Worms Priority: Minor Labels: features, udf Attachments: HIVE-2843.1.patch.txt, HIVE-2843.D8745.1.patch I propose the addition of two new Hive UDAF to help with maps in Apache Hive. The source code is available on GitHub at https://github.com/wdavidw/hive-udf in two Java classes: UDAFToMap and UDAFToOrderedMap. The first function convert an aggregation into a map and is internally using a Java `HashMap`. The second function extends the first one. It convert an aggregation into an ordered map and is internally using a Java `TreeMap`. They both extends the `AbstractGenericUDAFResolver` class. Also, I have covered the motivations and usages of those UDAF in a blog post at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/ The full patch is available with tests as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3741) Driver.validateConfVariables() should perform more validations
[ https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3741: - Resolution: Fixed Fix Version/s: 0.11.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed. Thanks Tim Driver.validateConfVariables() should perform more validations -- Key: HIVE-3741 URL: https://issues.apache.org/jira/browse/HIVE-3741 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Fix For: 0.11.0 Attachments: HIVE-3741.patch.1 Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3741) Driver.validateConfVariables() should perform more validations
[ https://issues.apache.org/jira/browse/HIVE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582952#comment-13582952 ] Gang Tim Liu commented on HIVE-3741: Namit, thank you very much Tim Driver.validateConfVariables() should perform more validations -- Key: HIVE-3741 URL: https://issues.apache.org/jira/browse/HIVE-3741 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Fix For: 0.11.0 Attachments: HIVE-3741.patch.1 Like List Bucketing, it should also check for HIVE_OPTIMIZE_UNION_REMOVE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4005) Column truncation
[ https://issues.apache.org/jira/browse/HIVE-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4005: - Status: Open (was: Patch Available) comments Column truncation - Key: HIVE-4005 URL: https://issues.apache.org/jira/browse/HIVE-4005 Project: Hive Issue Type: New Feature Components: CLI Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4005.1.patch.txt, HIVE-4005.2.patch.txt, HIVE-4005.3.patch.txt Column truncation allows users to remove data for columns that are no longer useful. This is done by removing the data for the column and setting the length of the column data and related lengths to 0 in the RC file header. RC file was fixed to recognize columns with lengths of zero to be empty and are treated as if the column doesn't exist in the data, a null is returned for every value of that column in every row. This is the same thing that happens when more columns are selected than exist in the file. A new command was added to the CLI TRUNCATE TABLE ... PARTITION ... COLUMNS ... This launches a map only job where each mapper rewrites a single file without the unnecessary column data and the adjusted headers. It does not uncompress/deserialize the data so it is much faster than rewriting the data with NULLs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4016) Remove init(fname) from TestParse.vm for each test
[ https://issues.apache.org/jira/browse/HIVE-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582970#comment-13582970 ] Ashutosh Chauhan commented on HIVE-4016: +1 Running tests. Remove init(fname) from TestParse.vm for each test -- Key: HIVE-4016 URL: https://issues.apache.org/jira/browse/HIVE-4016 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-4016.D8547.1.patch, HIVE-4016.D8547.2.patch TestParse does not change any of configuration or data, which means calling init() method before each test is not necessary. After removing it, test time reduced to 260sec to 16sec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3968) Enhance logging in TableAccessInfo
[ https://issues.apache.org/jira/browse/HIVE-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3968: - Status: Open (was: Patch Available) The tests table_access_keys_stats.q and table_access_keys_stats2.q are failing Enhance logging in TableAccessInfo -- Key: HIVE-3968 URL: https://issues.apache.org/jira/browse/HIVE-3968 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3968.1.patch.txt, HIVE-3968.2.patch.txt, HIVE-3968.3.patch.txt Based on what is currently available in the TableAccessInfo we can infer when it would be a good idea to add bucketing/sorting metadata for tables. However, we can't easily tell if we're already getting the benefits of bucketing/sorting. This information can be improved by a) storing the input table/partition objects so that we can tell if the tables/partitions are already bucketed/sorted b) running the TableAccessAnalyzer after the logical optimizer, so that we can tell from the operators whether or not we are already getting benefits (bucketed/sort merge map joins or map group bys) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-948) more query plan optimization rules
[ https://issues.apache.org/jira/browse/HIVE-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582973#comment-13582973 ] Ashutosh Chauhan commented on HIVE-948: --- Makes sense. Navis, once you update the patch (there are few more .q files which were added in trunk since you last updated the patch), I will get it in. more query plan optimization rules --- Key: HIVE-948 URL: https://issues.apache.org/jira/browse/HIVE-948 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Navis Attachments: HIVE-948.D8463.1.patch, HIVE-948.D8463.2.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.3.patch, HIVE-948.D8463.4.patch Many query plans are not optimal in that they contain redundant operators. Some examples are unnecessary select operators (select followed by select, select output being the same as input etc.). Even though these operators are not very expensive, they could account for around 10% of CPU time in some simple queries. It seems they are low-hanging fruits that we should pick first. BTW, it seems these optimization rules should be added at the last stage of the physical optimization phase since some redundant operators are added to facilitate physical plan generation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3710: --- Attachment: HIVE-3710.patch.3 HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator -- Key: HIVE-3710 URL: https://issues.apache.org/jira/browse/HIVE-3710 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3710.patch.1, HIVE-3710.patch.2, HIVE-3710.patch.3 It should be part of the plan instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-3710 started by Gang Tim Liu. HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator -- Key: HIVE-3710 URL: https://issues.apache.org/jira/browse/HIVE-3710 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3710.patch.1, HIVE-3710.patch.2, HIVE-3710.patch.3 It should be part of the plan instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3710) HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3710: --- Status: Patch Available (was: In Progress) Add a new test case. Existing stas-related test cases cover the case of hive.stats.collect.rawdatasize as true. The new test case compares config is on/off in order to ensure HIVE-3710 keeps existing logic intact. patch is available. both attachment and phabricator. HiveConf.ConfVars.HIVE_STATS_COLLECT_RAWDATASIZE should not be checked in FileSinkOperator -- Key: HIVE-3710 URL: https://issues.apache.org/jira/browse/HIVE-3710 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Attachments: HIVE-3710.patch.1, HIVE-3710.patch.2, HIVE-3710.patch.3 It should be part of the plan instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira