[jira] [Commented] (HIVE-6806) CREATE TABLE should support STORED AS AVRO
[ https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092487#comment-14092487 ] Lefty Leverenz commented on HIVE-6806: -- Thanks Ashish, your doc changes look good. I'm just making a few minor edits. This sentence in the Avro SerDe doc is out of date: The AvroSerde has been built and tested against Hive 0.9.1 and Avro 1.5. # Can I change it to tested against Hive 0.9.1 and later? # What Avro versions have been tested? (Their latest is 1.7.7: http://avro.apache.org/releases.html.) CREATE TABLE should support STORED AS AVRO -- Key: HIVE-6806 URL: https://issues.apache.org/jira/browse/HIVE-6806 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Jeremy Beard Assignee: Ashish Kumar Singh Priority: Minor Labels: Avro, TODOC14 Fix For: 0.14.0 Attachments: HIVE-6806.1.patch, HIVE-6806.2.patch, HIVE-6806.3.patch, HIVE-6806.patch Avro is well established and widely used within Hive, however creating Avro-backed tables requires the messy listing of the SerDe, InputFormat and OutputFormat classes. Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had native Avro support. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order
[ https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis reassigned HIVE-7669: --- Assignee: Navis parallel order by clause on a string column fails with IOException: Split points are out of order - Key: HIVE-7669 URL: https://issues.apache.org/jira/browse/HIVE-7669 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor, SQL Affects Versions: 0.12.0 Environment: Hive 0.12.0-cdh5.0.0 OS: Redhat linux Reporter: Vishal Kamath Assignee: Navis Labels: orderby Attachments: HIVE-7669.1.patch.txt The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) ... 17 more {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order
[ https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7669: Attachment: HIVE-7669.1.patch.txt parallel order by clause on a string column fails with IOException: Split points are out of order - Key: HIVE-7669 URL: https://issues.apache.org/jira/browse/HIVE-7669 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor, SQL Affects Versions: 0.12.0 Environment: Hive 0.12.0-cdh5.0.0 OS: Redhat linux Reporter: Vishal Kamath Labels: orderby Attachments: HIVE-7669.1.patch.txt The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) ... 17 more {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order
[ https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7669: Status: Patch Available (was: Open) Running preliminary test parallel order by clause on a string column fails with IOException: Split points are out of order - Key: HIVE-7669 URL: https://issues.apache.org/jira/browse/HIVE-7669 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor, SQL Affects Versions: 0.12.0 Environment: Hive 0.12.0-cdh5.0.0 OS: Redhat linux Reporter: Vishal Kamath Assignee: Navis Labels: orderby Attachments: HIVE-7669.1.patch.txt The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) ... 17 more {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7675) Implement native HiveMapFunction
[ https://issues.apache.org/jira/browse/HIVE-7675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7675: Issue Type: Sub-task (was: New Feature) Parent: HIVE-7292 Implement native HiveMapFunction Key: HIVE-7675 URL: https://issues.apache.org/jira/browse/HIVE-7675 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Currently, Hive on Spark depend on ExecMapper to execute operator logic, full stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators. HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several problems as following: # ExecMapper is designed for MR single process task mode, it does not work well under Spark multi-thread task node. # ExecMapper introduce extra API level restriction. We need implement native HiveMapFunction, as the bridge between Spark framework and Hive operators. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7675) Implement native HiveMapFunction
Chengxiang Li created HIVE-7675: --- Summary: Implement native HiveMapFunction Key: HIVE-7675 URL: https://issues.apache.org/jira/browse/HIVE-7675 Project: Hive Issue Type: New Feature Components: Spark Reporter: Chengxiang Li Currently, Hive on Spark depend on ExecMapper to execute operator logic, full stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators. HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several problems as following: # ExecMapper is designed for MR single process task mode, it does not work well under Spark multi-thread task node. # ExecMapper introduce extra API level restriction. We need implement native HiveMapFunction, as the bridge between Spark framework and Hive operators. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7676) Support more methods in DatabaseMetaData
Alexander Pivovarov created HIVE-7676: - Summary: Support more methods in DatabaseMetaData Key: HIVE-7676 URL: https://issues.apache.org/jira/browse/HIVE-7676 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Alexander Pivovarov I noticed that some methods in HiveDatabaseMetaData throws exceptions instead of returning true/false. Many JDBC clients expects implementations for particular methods in order to work. E.g. SQuirrel SQL show databases only if supportsSchemasInTableDefinitions returns true. Also hive 0.13.1 supports UNION ALL and does not support UNION we can indicate this in HiveDatabaseMetaData instead of throwing Method Not supported exception. getIdentifierQuoteString should return space if not supported. http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7676) Support more methods in DatabaseMetaData
[ https://issues.apache.org/jira/browse/HIVE-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7676: -- Description: I noticed that some methods in HiveDatabaseMetaData throws exceptions instead of returning true/false. Many JDBC clients expects implementations for particular methods in order to work. E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions returns true. Also hive 0.13.1 supports UNION ALL and does not support UNION we can indicate this in HiveDatabaseMetaData instead of throwing Method Not supported exception. getIdentifierQuoteString should return space if not supported. http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29 was: I noticed that some methods in HiveDatabaseMetaData throws exceptions instead of returning true/false. Many JDBC clients expects implementations for particular methods in order to work. E.g. SQuirrel SQL show databases only if supportsSchemasInTableDefinitions returns true. Also hive 0.13.1 supports UNION ALL and does not support UNION we can indicate this in HiveDatabaseMetaData instead of throwing Method Not supported exception. getIdentifierQuoteString should return space if not supported. http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29 Support more methods in DatabaseMetaData Key: HIVE-7676 URL: https://issues.apache.org/jira/browse/HIVE-7676 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Alexander Pivovarov I noticed that some methods in HiveDatabaseMetaData throws exceptions instead of returning true/false. Many JDBC clients expects implementations for particular methods in order to work. E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions returns true. Also hive 0.13.1 supports UNION ALL and does not support UNION we can indicate this in HiveDatabaseMetaData instead of throwing Method Not supported exception. getIdentifierQuoteString should return space if not supported. http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7677) Implement native HiveReduceFunction
Chengxiang Li created HIVE-7677: --- Summary: Implement native HiveReduceFunction Key: HIVE-7677 URL: https://issues.apache.org/jira/browse/HIVE-7677 Project: Hive Issue Type: New Feature Components: Spark Reporter: Chengxiang Li Similar as HiveMapFunction, We need implement native HiveReduceFunction, as the bridge between Spark framework and Hive operators. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7677) Implement native HiveReduceFunction
[ https://issues.apache.org/jira/browse/HIVE-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7677: Issue Type: Sub-task (was: New Feature) Parent: HIVE-7292 Implement native HiveReduceFunction --- Key: HIVE-7677 URL: https://issues.apache.org/jira/browse/HIVE-7677 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Similar as HiveMapFunction, We need implement native HiveReduceFunction, as the bridge between Spark framework and Hive operators. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7643) ExecMapper static states lead to unpredictable query result.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7643: Summary: ExecMapper static states lead to unpredictable query result.[Spark Branch] (was: ExecMapper statis states lead to unpredictable query result.[Spark Branch]) ExecMapper static states lead to unpredictable query result.[Spark Branch] -- Key: HIVE-7643 URL: https://issues.apache.org/jira/browse/HIVE-7643 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li ExecMapper contain static states, static variable done for example. Spark executor may execute multi tasks concurrently, ExecMapper static state updated by one task would influence the logic of another task, which may lead to unpredictable result. To reproduce, execute {code:sql} SELECT COUNT(*) FROM TEST TABLESAMPLE(1 ROWS) s {code}, TEST should be a table with several blocks source data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7675) Implement native HiveMapFunction
[ https://issues.apache.org/jira/browse/HIVE-7675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7675: Description: Currently, Hive on Spark depend on ExecMapper to execute operator logic, full stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators. HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several problems as following: # ExecMapper is designed for MR single process task mode, it does not work well under Spark multi-thread task node. # ExecMapper introduce extra API level restriction and process logic. We need implement native HiveMapFunction, as the bridge between Spark framework and Hive operators. was: Currently, Hive on Spark depend on ExecMapper to execute operator logic, full stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators. HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several problems as following: # ExecMapper is designed for MR single process task mode, it does not work well under Spark multi-thread task node. # ExecMapper introduce extra API level restriction. We need implement native HiveMapFunction, as the bridge between Spark framework and Hive operators. Implement native HiveMapFunction Key: HIVE-7675 URL: https://issues.apache.org/jira/browse/HIVE-7675 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Currently, Hive on Spark depend on ExecMapper to execute operator logic, full stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators. HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several problems as following: # ExecMapper is designed for MR single process task mode, it does not work well under Spark multi-thread task node. # ExecMapper introduce extra API level restriction and process logic. We need implement native HiveMapFunction, as the bridge between Spark framework and Hive operators. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23674: Handle db qualified names consistently across all HiveQL statements
On Aug. 11, 2014, 3:51 a.m., Lefty Leverenz wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 2071 https://reviews.apache.org/r/23674/diff/3/?file=657207#file657207line2071 which in the form should be which is in the form fixed. On Aug. 11, 2014, 3:51 a.m., Lefty Leverenz wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 2086 https://reviews.apache.org/r/23674/diff/3/?file=657207#file657207line2086 which in the form should be which is in the form fixed. On Aug. 11, 2014, 3:51 a.m., Lefty Leverenz wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java, lines 103-105 https://reviews.apache.org/r/23674/diff/3/?file=657209#file657209line103 should does contain database name be does not contain database name? You are right. Fixed again. Thanks. - Navis --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23674/#review50132 --- On Aug. 11, 2014, 12:53 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23674/ --- (Updated Aug. 11, 2014, 12:53 a.m.) Review request for hive and Thejas Nair. Bugs: HIVE-4064 https://issues.apache.org/jira/browse/HIVE-4064 Repository: hive-git Description --- Hive doesn't consistently handle db qualified names across all HiveQL statements. While some HiveQL statements such as SELECT support DB qualified names, other such as CREATE INDEX doesn't. Diffs - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerCheckInvocation.java c91b15c itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/CheckColumnAccessHook.java 14fc430 metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ea866c5 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 6e689d0 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 5a56ced metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 760777a metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 74b1432 ql/src/java/org/apache/hadoop/hive/ql/Driver.java ea6ddbf ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 376e040 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d22b1f6 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 39b032e ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java 2e32fee ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 989d0b5 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 22945e3 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessInfo.java 939dc65 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 67a3aa7 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g ab1188a ql/src/java/org/apache/hadoop/hive/ql/parse/IndexUpdater.java 856ec2f ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 7b86414 ql/src/java/org/apache/hadoop/hive/ql/parse/authorization/HiveAuthorizationTaskFactoryImpl.java 826bdf3 ql/src/java/org/apache/hadoop/hive/ql/plan/AlterIndexDesc.java 0318e4b ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableAlterPartDesc.java cf67e16 ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableSimpleDesc.java 541675c ql/src/java/org/apache/hadoop/hive/ql/plan/PrivilegeObjectDesc.java 9417220 ql/src/java/org/apache/hadoop/hive/ql/plan/RenamePartitionDesc.java 1b5fb9e ql/src/java/org/apache/hadoop/hive/ql/plan/ShowColumnsDesc.java fe6a91e ql/src/java/org/apache/hadoop/hive/ql/plan/ShowGrantDesc.java aa88153 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationUtils.java 5c94217 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrivilegeObject.java 9e9ef71 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveV1Authorizer.java fbc0090 ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java 98c2924 ql/src/test/org/apache/hadoop/hive/ql/parse/TestQBCompact.java 5f32d5f ql/src/test/org/apache/hadoop/hive/ql/parse/authorization/PrivilegesTestBase.java 93901ec ql/src/test/org/apache/hadoop/hive/ql/parse/authorization/TestHiveAuthorizationTaskFactory.java ab0d80e ql/src/test/org/apache/hadoop/hive/ql/parse/authorization/TestPrivilegesV1.java fd827ad ql/src/test/org/apache/hadoop/hive/ql/parse/authorization/TestPrivilegesV2.java 9499986 ql/src/test/queries/clientpositive/alter_rename_table.q PRE-CREATION
[jira] [Updated] (HIVE-7648) authorization api should provide table/db object for create table/dbname
[ https://issues.apache.org/jira/browse/HIVE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7648: Status: Patch Available (was: Open) authorization api should provide table/db object for create table/dbname Key: HIVE-7648 URL: https://issues.apache.org/jira/browse/HIVE-7648 Project: Hive Issue Type: Bug Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7648.1.patch For create table, the Authorizer.checkPrivileges call provides only the database name. If the table name is passed, it will be possible for the authorization api implementation to appropriately set the permissions of the new table. Similarly, in case of create-database, the api call should provide database object for the database being created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7648) authorization api should provide table/db object for create table/dbname
[ https://issues.apache.org/jira/browse/HIVE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7648: Attachment: HIVE-7648.1.patch HIVE-7648.1.patch - Initial patch more q.out files need to be updated Also has update to provide database name in case of 'use db;' , and propagation of base table information in index commands. authorization api should provide table/db object for create table/dbname Key: HIVE-7648 URL: https://issues.apache.org/jira/browse/HIVE-7648 Project: Hive Issue Type: Bug Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7648.1.patch For create table, the Authorizer.checkPrivileges call provides only the database name. If the table name is passed, it will be possible for the authorization api implementation to appropriately set the permissions of the new table. Similarly, in case of create-database, the api call should provide database object for the database being created. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements
[ https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092514#comment-14092514 ] Navis commented on HIVE-4064: - input3.q.out needed to be updated, but cannot reproduce schemeAuthority.q and ql_rewrite_gbtoidx.q (tried with hadoop-1 and hadoop-2). Handle db qualified names consistently across all HiveQL statements --- Key: HIVE-4064 URL: https://issues.apache.org/jira/browse/HIVE-4064 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Navis Attachments: HIVE-4064-1.patch, HIVE-4064.1.patch.txt, HIVE-4064.2.patch.txt, HIVE-4064.3.patch.txt, HIVE-4064.4.patch.txt, HIVE-4064.5.patch.txt, HIVE-4064.6.patch.txt, HIVE-4064.7.patch.txt, HIVE-4064.8.patch.txt Hive doesn't consistently handle db qualified names across all HiveQL statements. While some HiveQL statements such as SELECT support DB qualified names, other such as CREATE INDEX doesn't. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements
[ https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-4064: Attachment: HIVE-4064.8.patch.txt Handle db qualified names consistently across all HiveQL statements --- Key: HIVE-4064 URL: https://issues.apache.org/jira/browse/HIVE-4064 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Navis Attachments: HIVE-4064-1.patch, HIVE-4064.1.patch.txt, HIVE-4064.2.patch.txt, HIVE-4064.3.patch.txt, HIVE-4064.4.patch.txt, HIVE-4064.5.patch.txt, HIVE-4064.6.patch.txt, HIVE-4064.7.patch.txt, HIVE-4064.8.patch.txt Hive doesn't consistently handle db qualified names across all HiveQL statements. While some HiveQL statements such as SELECT support DB qualified names, other such as CREATE INDEX doesn't. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6099) Multi insert does not work properly with distinct count
[ https://issues.apache.org/jira/browse/HIVE-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092516#comment-14092516 ] Navis commented on HIVE-6099: - [~leftylev] You are right, as always. I've confirmed that it's included in hive-0.11.0. [~ashutoshc] I cannot sure but the optimization seemed not valid. If this will not be fixed till next release process, we should disabled it by default. Multi insert does not work properly with distinct count --- Key: HIVE-6099 URL: https://issues.apache.org/jira/browse/HIVE-6099 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0 Reporter: Pavan Gadam Manohar Assignee: Navis Labels: count, distinct, insert, multi-insert Attachments: explain_hive_0.10.0.txt, with_disabled.txt, with_enabled.txt Need 2 rows to reproduce this Bug. Here are the steps. Step 1) Create a table Table_A CREATE EXTERNAL TABLE Table_A ( user string , type int ) PARTITIONED BY (dt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS RCFILE LOCATION '/hive/path/Table_A'; Step 2) Scenario: Lets us say consider user tommy belong to both usertypes 111 and 123. Insert 2 records into the table created above. select * from Table_A; hive select * from table_a; OK tommy 123 2013-12-02 tommy 111 2013-12-02 Step 3) Create 2 destination tables to simulate multi-insert. CREATE EXTERNAL TABLE dest_Table_A ( p_date string , Distinct_Users int , Type111Users int , Type123Users int ) PARTITIONED BY (dt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS RCFILE LOCATION '/hive/path/dest_Table_A'; CREATE EXTERNAL TABLE dest_Table_B ( p_date string , Distinct_Users int , Type111Users int , Type123Users int ) PARTITIONED BY (dt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS RCFILE LOCATION '/hive/path/dest_Table_B'; Step 4) Multi insert statement from Table_A a INSERT OVERWRITE TABLE dest_Table_A PARTITION(dt='2013-12-02') select a.dt ,count(distinct a.user) as AllDist ,count(distinct case when a.type = 111 then a.user else null end) as Type111User ,count(distinct case when a.type != 111 then a.user else null end) as Type123User group by a.dt INSERT OVERWRITE TABLE dest_Table_B PARTITION(dt='2013-12-02') select a.dt ,count(distinct a.user) as AllDist ,count(distinct case when a.type = 111 then a.user else null end) as Type111User ,count(distinct case when a.type != 111 then a.user else null end) as Type123User group by a.dt ; Step 5) Verify results. hive select * from dest_table_a; OK 2013-12-02 2 1 1 2013-12-02 Time taken: 0.116 seconds hive select * from dest_table_b; OK 2013-12-02 2 1 1 2013-12-02 Time taken: 0.13 seconds Conclusion: Hive gives a count of 2 for distinct users although there is only one distinct user. After trying many datasets observed that Hive is doing Type111Users + Typoe123Users = DistinctUsers which is wrong. hive select count(distinct a.user) from table_a a; Gives: Total MapReduce CPU Time Spent: 4 seconds 350 msec OK 1 -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24377: HIVE-7142 Hive multi serialization encoding support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24377/ --- (Updated Aug. 11, 2014, 7:30 a.m.) Review request for hive. Bugs: HIVE-7142 https://issues.apache.org/jira/browse/HIVE-7142 Repository: hive-git Description --- Currently Hive only support serialize data into UTF-8 charset bytes or deserialize from UTF-8 bytes, real world users may want to load different kinds of encoded data into hive directly. This jira is dedicated to support serialize/deserialize all kinds of encoded data in SerDe layer. For user, only need to configure serialization encoding on table level by set serialization encoding through serde parameter, for example: CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES(serialization.encoding='GBK'); or ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property in this patch. Diffs (updated) - serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java 515cf25 serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/DelimitedJSONSerDe.java 179f9b5 serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java b7fb048 serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java fb55c70 Diff: https://reviews.apache.org/r/24377/diff/ Testing --- Thanks, chengxiang li
[jira] [Updated] (HIVE-7676) Support more methods in DatabaseMetaData
[ https://issues.apache.org/jira/browse/HIVE-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7676: -- Attachment: HIVE-7676.patch Support more methods in DatabaseMetaData Key: HIVE-7676 URL: https://issues.apache.org/jira/browse/HIVE-7676 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Alexander Pivovarov Attachments: HIVE-7676.patch I noticed that some methods in HiveDatabaseMetaData throws exceptions instead of returning true/false. Many JDBC clients expects implementations for particular methods in order to work. E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions returns true. Also hive 0.13.1 supports UNION ALL and does not support UNION we can indicate this in HiveDatabaseMetaData instead of throwing Method Not supported exception. getIdentifierQuoteString should return space if not supported. http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29 -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24377: HIVE-7142 Hive multi serialization encoding support
On Aug. 11, 2014, 4:52 a.m., Brock Noland wrote: serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java, line 43 https://reviews.apache.org/r/24377/diff/3/?file=653662#file653662line43 Can we make these constants? serialization.encoding is probably already available somewhere. add serialization.encoding to serdeConstant class if that's what you mean here. - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24377/#review50145 --- On Aug. 6, 2014, 9:11 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24377/ --- (Updated Aug. 6, 2014, 9:11 a.m.) Review request for hive. Bugs: HIVE-7142 https://issues.apache.org/jira/browse/HIVE-7142 Repository: hive-git Description --- Currently Hive only support serialize data into UTF-8 charset bytes or deserialize from UTF-8 bytes, real world users may want to load different kinds of encoded data into hive directly. This jira is dedicated to support serialize/deserialize all kinds of encoded data in SerDe layer. For user, only need to configure serialization encoding on table level by set serialization encoding through serde parameter, for example: CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES(serialization.encoding='GBK'); or ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property in this patch. Diffs - serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/DelimitedJSONSerDe.java 179f9b5 serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java b7fb048 serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java fb55c70 Diff: https://reviews.apache.org/r/24377/diff/ Testing --- Thanks, chengxiang li
[jira] [Updated] (HIVE-7142) Hive multi serialization encoding support
[ https://issues.apache.org/jira/browse/HIVE-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7142: Attachment: HIVE-7142.3.patch Hive multi serialization encoding support - Key: HIVE-7142 URL: https://issues.apache.org/jira/browse/HIVE-7142 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7142.1.patch.txt, HIVE-7142.2.patch, HIVE-7142.3.patch Currently Hive only support serialize data into UTF-8 charset bytes or deserialize from UTF-8 bytes, real world users may want to load different kinds of encoded data into hive directly. This jira is dedicated to support serialize/deserialize all kinds of encoded data in SerDe layer. For user, only need to configure serialization encoding on table level by set serialization encoding through serde parameter, for example: {code:sql} CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES(serialization.encoding='GBK'); {code} or {code:sql} ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); {code} LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property in this patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7676) Support more methods in DatabaseMetaData
[ https://issues.apache.org/jira/browse/HIVE-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7676: -- Description: I noticed that some methods in HiveDatabaseMetaData throws exceptions instead of returning true/false. Many JDBC clients expects implementations for particular methods in order to work. E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions returns true. Also hive 0.14.0 supports UNION ALL and does not support UNION We can indicate this in HiveDatabaseMetaData instead of throwing Method Not supported exception. getIdentifierQuoteString should return space if not supported. http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29 was: I noticed that some methods in HiveDatabaseMetaData throws exceptions instead of returning true/false. Many JDBC clients expects implementations for particular methods in order to work. E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions returns true. Also hive 0.13.1 supports UNION ALL and does not support UNION we can indicate this in HiveDatabaseMetaData instead of throwing Method Not supported exception. getIdentifierQuoteString should return space if not supported. http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29 Support more methods in DatabaseMetaData Key: HIVE-7676 URL: https://issues.apache.org/jira/browse/HIVE-7676 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Alexander Pivovarov Attachments: HIVE-7676.patch I noticed that some methods in HiveDatabaseMetaData throws exceptions instead of returning true/false. Many JDBC clients expects implementations for particular methods in order to work. E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions returns true. Also hive 0.14.0 supports UNION ALL and does not support UNION We can indicate this in HiveDatabaseMetaData instead of throwing Method Not supported exception. getIdentifierQuoteString should return space if not supported. http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7676) Support more methods in HiveDatabaseMetaData
[ https://issues.apache.org/jira/browse/HIVE-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7676: -- Summary: Support more methods in HiveDatabaseMetaData (was: Support more methods in DatabaseMetaData) Support more methods in HiveDatabaseMetaData Key: HIVE-7676 URL: https://issues.apache.org/jira/browse/HIVE-7676 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Alexander Pivovarov Attachments: HIVE-7676.patch I noticed that some methods in HiveDatabaseMetaData throws exceptions instead of returning true/false. Many JDBC clients expects implementations for particular methods in order to work. E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions returns true. Also hive 0.14.0 supports UNION ALL and does not support UNION We can indicate this in HiveDatabaseMetaData instead of throwing Method Not supported exception. getIdentifierQuoteString should return space if not supported. http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7532) allow disabling direct sql per query with external metastore
[ https://issues.apache.org/jira/browse/HIVE-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7532: Attachment: HIVE-7532.6.patch.txt allow disabling direct sql per query with external metastore Key: HIVE-7532 URL: https://issues.apache.org/jira/browse/HIVE-7532 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Navis Attachments: HIVE-7532.1.patch.txt, HIVE-7532.2.nogen, HIVE-7532.2.patch.txt, HIVE-7532.3.patch.txt, HIVE-7532.4.patch.txt, HIVE-7532.5.patch.txt, HIVE-7532.6.patch.txt Currently with external metastore, direct sql can only be disabled via metastore config globally. Perhaps it makes sense to have the ability to propagate the setting per query from client to override the metastore setting, e.g. if one particular query causes it to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24137: allow disabling direct sql per query with external metastore
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24137/ --- (Updated Aug. 11, 2014, 7:37 a.m.) Review request for hive. Changes --- added getMetaConf(), which shows current value of the meta variable. Bugs: HIVE-7532 https://issues.apache.org/jira/browse/HIVE-7532 Repository: hive-git Description --- Currently with external metastore, direct sql can only be disabled via metastore config globally. Perhaps it makes sense to have the ability to propagate the setting per query from client to override the metastore setting, e.g. if one particular query causes it to fail. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8490558 common/src/java/org/apache/hadoop/hive/conf/SystemVariables.java ee98d17 itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java 9e416b5 metastore/if/hive_metastore.thrift 9e93b95 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 6e689d0 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 85a77d9 metastore/src/java/org/apache/hadoop/hive/metastore/IHMSHandler.java 1675751 metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 8746c37 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java c28c46a metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java 86172b9 metastore/src/java/org/apache/hadoop/hive/metastore/events/ConfigChangeEvent.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 39b032e ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java 2baa24a ql/src/test/queries/clientpositive/set_metaconf.q PRE-CREATION ql/src/test/results/clientpositive/set_metaconf.q.out PRE-CREATION service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 4c3164e service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java b39d64d service/src/java/org/apache/hive/service/cli/session/SessionManager.java c2f0495 Diff: https://reviews.apache.org/r/24137/diff/ Testing --- Thanks, Navis Ryu
[jira] [Commented] (HIVE-6806) CREATE TABLE should support STORED AS AVRO
[ https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092534#comment-14092534 ] Lefty Leverenz commented on HIVE-6806: -- [~singhashish], why did you outdent union1 to bytes1 in the examples? I aligned them with the rest of the data types, then indented all of them two more spaces to make STORED AS AVRO stand out -- but if you wanted the outdent, please revert my changes or ask me to do it. Also, your example in Hive 0.14 and later versions under Creating Avro-backed Hive tables is identical to the one you added to the code block in All Hive versions just before it -- was that deliberate, or an editing artifact? It seems to me the Hive 0.14 example in All Hive versions isn't necessary, but I left it in for now. Please review my changes, because I moved some information around. * [Avro SerDe | https://cwiki.apache.org/confluence/display/Hive/AvroSerDe] CREATE TABLE should support STORED AS AVRO -- Key: HIVE-6806 URL: https://issues.apache.org/jira/browse/HIVE-6806 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Jeremy Beard Assignee: Ashish Kumar Singh Priority: Minor Labels: Avro, TODOC14 Fix For: 0.14.0 Attachments: HIVE-6806.1.patch, HIVE-6806.2.patch, HIVE-6806.3.patch, HIVE-6806.patch Avro is well established and widely used within Hive, however creating Avro-backed tables requires the messy listing of the SerDe, InputFormat and OutputFormat classes. Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had native Avro support. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24445: HIVE-7642, Set hive input format by configuration.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24445/ --- (Updated Aug. 11, 2014, 7:45 a.m.) Review request for hive, Brock Noland and Szehon Ho. Bugs: HIVE-7642 https://issues.apache.org/jira/browse/HIVE-7642 Repository: hive-git Description --- Currently hive input format is hard coded as HiveInputFormat, we should set this parameter from configuration. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 45eff67 Diff: https://reviews.apache.org/r/24445/diff/ Testing --- Thanks, chengxiang li
[jira] [Updated] (HIVE-7642) Set hive input format by configuration.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7642: Attachment: HIVE-7642.2-spark.patch Set hive input format by configuration.[Spark Branch] - Key: HIVE-7642 URL: https://issues.apache.org/jira/browse/HIVE-7642 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7642.1-spark.patch, HIVE-7642.2-spark.patch Currently hive input format is hard coded as HiveInputFormat, we should set this parameter from configuration. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6329) Support column level encryption/decryption
[ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6329: Description: Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} was: Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.indices'='2,3', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7142) Hive multi serialization encoding support
[ https://issues.apache.org/jira/browse/HIVE-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092539#comment-14092539 ] Navis commented on HIVE-7142: - I think this can be implemented on HIVE-6329. Seemed need some more check (decoding should be applied on strings only, for example). Hive multi serialization encoding support - Key: HIVE-7142 URL: https://issues.apache.org/jira/browse/HIVE-7142 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7142.1.patch.txt, HIVE-7142.2.patch, HIVE-7142.3.patch Currently Hive only support serialize data into UTF-8 charset bytes or deserialize from UTF-8 bytes, real world users may want to load different kinds of encoded data into hive directly. This jira is dedicated to support serialize/deserialize all kinds of encoded data in SerDe layer. For user, only need to configure serialization encoding on table level by set serialization encoding through serde parameter, for example: {code:sql} CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES(serialization.encoding='GBK'); {code} or {code:sql} ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); {code} LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property in this patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7623) hive partition rename fails if filesystem cache is disabled
[ https://issues.apache.org/jira/browse/HIVE-7623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7623: Attachment: HIVE-7623.1.patch.txt hive partition rename fails if filesystem cache is disabled --- Key: HIVE-7623 URL: https://issues.apache.org/jira/browse/HIVE-7623 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0, 0.13.1 Reporter: agate Attachments: HIVE-7623.1.patch.txt Seems to be similar issue https://issues.apache.org/jira/browse/HIVE-3815 when calling alterPartition (when renaming partitions) Setting fs.hdfs.impl.disable.cache=false and fs.file.impl.disable.cache=falseworks around this problem Error: = 2014-08-05 21:46:14,522 ERROR [pool-3-thread-1]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(143)) - InvalidOperationException(message:table new location hdfs://hadoop-namenode:8020/user/hive/warehouse/sample_logs/XX=AA/YY=123 is on a different file system than the old location hdfs://hadoop-namenode:8020/user/hive/warehouse/sample_logs/XX=AA/YY=456. This operation is not supported) at org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartition(HiveAlterHandler.java:361) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rename_partition(HiveMetaStore.java:2629) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rename_partition(HiveMetaStore.java:2602) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:622) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at com.sun.proxy.$Proxy5.rename_partition(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$rename_partition.getResult(ThriftHiveMetastore.java:9057) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$rename_partition.getResult(ThriftHiveMetastore.java:9041) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) Looking at the code apache-hive-0.13.1-src/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java on line 361 see that its using != to compare filesystem objects // check that src and dest are on the same file system if (srcFs != destFs) { throw new InvalidOperationException(table new location + destPath + is on a different file system than the old location + srcPath + . This operation is not supported); } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements
[ https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092540#comment-14092540 ] Lefty Leverenz commented on HIVE-4064: -- +1 for the javadocs and code comments Handle db qualified names consistently across all HiveQL statements --- Key: HIVE-4064 URL: https://issues.apache.org/jira/browse/HIVE-4064 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Navis Attachments: HIVE-4064-1.patch, HIVE-4064.1.patch.txt, HIVE-4064.2.patch.txt, HIVE-4064.3.patch.txt, HIVE-4064.4.patch.txt, HIVE-4064.5.patch.txt, HIVE-4064.6.patch.txt, HIVE-4064.7.patch.txt, HIVE-4064.8.patch.txt Hive doesn't consistently handle db qualified names across all HiveQL statements. While some HiveQL statements such as SELECT support DB qualified names, other such as CREATE INDEX doesn't. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7623) hive partition rename fails if filesystem cache is disabled
[ https://issues.apache.org/jira/browse/HIVE-7623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7623: Assignee: Navis Status: Patch Available (was: Open) Missed this in HIVE-3815 hive partition rename fails if filesystem cache is disabled --- Key: HIVE-7623 URL: https://issues.apache.org/jira/browse/HIVE-7623 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1, 0.13.0 Reporter: agate Assignee: Navis Attachments: HIVE-7623.1.patch.txt Seems to be similar issue https://issues.apache.org/jira/browse/HIVE-3815 when calling alterPartition (when renaming partitions) Setting fs.hdfs.impl.disable.cache=false and fs.file.impl.disable.cache=falseworks around this problem Error: = 2014-08-05 21:46:14,522 ERROR [pool-3-thread-1]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(143)) - InvalidOperationException(message:table new location hdfs://hadoop-namenode:8020/user/hive/warehouse/sample_logs/XX=AA/YY=123 is on a different file system than the old location hdfs://hadoop-namenode:8020/user/hive/warehouse/sample_logs/XX=AA/YY=456. This operation is not supported) at org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartition(HiveAlterHandler.java:361) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rename_partition(HiveMetaStore.java:2629) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rename_partition(HiveMetaStore.java:2602) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:622) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at com.sun.proxy.$Proxy5.rename_partition(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$rename_partition.getResult(ThriftHiveMetastore.java:9057) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$rename_partition.getResult(ThriftHiveMetastore.java:9041) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) Looking at the code apache-hive-0.13.1-src/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java on line 361 see that its using != to compare filesystem objects // check that src and dest are on the same file system if (srcFs != destFs) { throw new InvalidOperationException(table new location + destPath + is on a different file system than the old location + srcPath + . This operation is not supported); } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output
[ https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092547#comment-14092547 ] Hive QA commented on HIVE-7390: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660905/HIVE-7390.9.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5873 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/248/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/248/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-248/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660905 Make quote character optional and configurable in BeeLine CSV/TSV output Key: HIVE-7390 URL: https://issues.apache.org/jira/browse/HIVE-7390 Project: Hive Issue Type: New Feature Components: Clients Affects Versions: 0.13.1 Reporter: Jim Halfpenny Assignee: Ferdinand Xu Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, HIVE-7390.4.patch, HIVE-7390.5.patch, HIVE-7390.6.patch, HIVE-7390.7.patch, HIVE-7390.8.patch, HIVE-7390.9.patch, HIVE-7390.patch Currently when either the CSV or TSV output formats are used in beeline each column is wrapped in single quotes. Quote wrapping of columns should be optional and the user should be able to choose the character used to wrap the columns. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7641) INSERT ... SELECT with no source table leads to NPE
[ https://issues.apache.org/jira/browse/HIVE-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7641: Assignee: Navis Status: Patch Available (was: Open) INSERT ... SELECT with no source table leads to NPE --- Key: HIVE-7641 URL: https://issues.apache.org/jira/browse/HIVE-7641 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Lenni Kuff Assignee: Navis Attachments: HIVE-7641.1.patch.txt When no source table is provided for an INSERT statement Hive fails with NPE. {code} 0: jdbc:hive2://localhost:11050/default create table test_tbl(i int); No rows affected (0.333 seconds) 0: jdbc:hive2://localhost:11050/default insert into table test_tbl select 1; Error: Error while compiling statement: FAILED: NullPointerException null (state=42000,code=4) -- Get a NPE even when using incorrect syntax (no TABLE keyword) 0: jdbc:hive2://localhost:11050/default insert into test_tbl select 1; Error: Error while compiling statement: FAILED: NullPointerException null (state=42000,code=4) -- Works when a source table is provided 0: jdbc:hive2://localhost:11050/default insert into table test_tbl select 1 from foo; No rows affected (5.751 seconds) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7641) INSERT ... SELECT with no source table leads to NPE
[ https://issues.apache.org/jira/browse/HIVE-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7641: Attachment: HIVE-7641.1.patch.txt INSERT ... SELECT with no source table leads to NPE --- Key: HIVE-7641 URL: https://issues.apache.org/jira/browse/HIVE-7641 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Lenni Kuff Attachments: HIVE-7641.1.patch.txt When no source table is provided for an INSERT statement Hive fails with NPE. {code} 0: jdbc:hive2://localhost:11050/default create table test_tbl(i int); No rows affected (0.333 seconds) 0: jdbc:hive2://localhost:11050/default insert into table test_tbl select 1; Error: Error while compiling statement: FAILED: NullPointerException null (state=42000,code=4) -- Get a NPE even when using incorrect syntax (no TABLE keyword) 0: jdbc:hive2://localhost:11050/default insert into test_tbl select 1; Error: Error while compiling statement: FAILED: NullPointerException null (state=42000,code=4) -- Works when a source table is provided 0: jdbc:hive2://localhost:11050/default insert into table test_tbl select 1 from foo; No rows affected (5.751 seconds) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-7624: - Attachment: HIVE-7624.5-spark.patch Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7653) Hive AvroSerDe does not support circular references in Schema
[ https://issues.apache.org/jira/browse/HIVE-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092591#comment-14092591 ] Hive QA commented on HIVE-7653: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660923/HIVE-7653.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5889 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/249/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/249/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-249/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660923 Hive AvroSerDe does not support circular references in Schema - Key: HIVE-7653 URL: https://issues.apache.org/jira/browse/HIVE-7653 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Sachin Goyal Attachments: HIVE-7653.1.patch, HIVE-7653.2.patch Avro allows nullable circular references but Hive AvroSerDe does not. Example of circular references (passing in Avro but failing in AvroSerDe): {code} class AvroCycleParent { AvroCycleChild child; public AvroCycleChild getChild () {return child;} public void setChild (AvroCycleChild child) {this.child = child;} } class AvroCycleChild { AvroCycleParent parent; public AvroCycleParent getParent () {return parent;} public void setParent (AvroCycleParent parent) {this.parent = parent;} } {code} Due to this discrepancy, Hive is unable to read Avro records having circular-references. For some third-party code with such references, it becomes very hard to directly serialize it with Avro and use in Hive. I have a patch for this with a unit-test and I will submit it shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7661) Observed performance issues while sorting using Hive's Parallel Order by clause while retaining pre-existing sort order.
[ https://issues.apache.org/jira/browse/HIVE-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092595#comment-14092595 ] Navis commented on HIVE-7661: - [~vishal.kamath] Thinking of implementing something like a InputSampler.SplitSampler. Would it be helpful for this case? Observed performance issues while sorting using Hive's Parallel Order by clause while retaining pre-existing sort order. Key: HIVE-7661 URL: https://issues.apache.org/jira/browse/HIVE-7661 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.12.0 Environment: Cloudera 5.0 hive-0.12.0-cdh5.0.0 Red Hat Linux Reporter: Vishal Kamath Labels: performance Fix For: 0.12.1 Improve Hive's sampling logic to accommodate use cases that require to retain the pre existing sort in the underlying source table. In order to support Parallel order by clause, Hive Samples the source table based on values provided to hive.optimize.sampling.orderby.number and hive.optimize.sampling.orderby.percent. This does work with reasonable performance when sorting is performed on a columns having random distribution of data but has severe performance issues when retaining the sort order. Let us try to understand this with an example. insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_orderkey, l_partkey, l_suppkey; Sample data set for lineitem table. The first column represents the l_orderKey and is sorted. l_orderkey|l_partkey|l_suppkey|l_linenumber|l_quantity|l_extendedprice|l_discount|l_tax|l_returnflag|l_linestatus|l_shipdate|l_commitdate|l_receiptdate|l_shipinstruct|l_shipmode|l_comment 197|1771022|96040|2|8|8743.52|0.09|0.02|A|F|1995-04-17|1995-07-01|1995-0 197|1771022|96040|2|8|4-27|DELIVER IN PERSON|SHIP|y blithely even 197|1771022|96040|2|8|deposits. blithely fina| 197|1558290|83306|3|17|22919.74|0.06|0.02|N|O|1995-08-02|1995-06-23|1995 197|1558290|83306|3|17|-08-03|COLLECT COD|REG AIR|ts. careful| 197|179355|29358|4|25|35858.75|0.04|0.01|N|F|1995-06-13|1995-05-23|1995- 197|179355|29358|4|25|06-24|TAKE BACK RETURN|FOB|s-- quickly final 197|179355|29358|4|25|accounts| 197|414653|39658|5|14|21946.82|0.09|0.01|R|F|1995-05-08|1995-05-24|1995- 197|414653|39658|5|14|05-12|TAKE BACK RETURN|RAIL|use slyly slyly silent 197|414653|39658|5|14|depo| 197|1058800|8821|6|1|1758.75|0.07|0.05|N|O|1995-07-15|1995-06-21|1995-08 197|1058800|8821|6|1|-11|COLLECT COD|RAIL| even, thin dependencies sno| 198|560609|60610|1|33|55096.14|0.07|0.02|N|O|1998-01-05|1998-03-20|1998- 198|560609|60610|1|33|01-10|TAKE BACK RETURN|TRUCK|carefully caref| 198|152287|77289|2|20|26785.60|0.03|0.00|N|O|1998-01-15|1998-03-31|1998- 198|152287|77289|2|20|01-25|DELIVER IN PERSON|FOB|carefully final 198|152287|77289|2|20|escapades a| 224|1899665|74720|3|41|68247.37|0.07|0.04|A|F|1994-09-01|1994-09-15|1994 224|1899665|74720|3|41|-09-02|TAKE BACK RETURN|SHIP|after the furiou| When we try to either sort on a presorted column or do a multi-column sort while trying to retain the sort order on the source table, Source table lineitem has 600 million rows. We don't see equal distribution of data to the reducers. Out of 100 reducers, 99 complete in less than 40 seconds. The last reducer is doing the bulk of the work processing nearly 570 million rows. So, let us understand what is going wrong here .. on a table having 600 million records with orderkey column sorted, i created temp table with 10% sampling. insert overwrite table sampTempTbl (select * from lineitem tablesample (10 percent) t); select min(l_orderkey), max(l_orderkey) from sampTempTbl ; 12306309,142321700 where as on the source table, the orderkey range (select min(l_orderkey), max(l_orderkey) from lineitem) is 1 and 6 So naturally bulk of the records will be directed towards single reducer. One way to work around this problem is to increase the hive.optimize.sampling.orderby.number to a larger value (as close as the # rows in the input source table). But then we will have to provide higher heap (hive-env.sh) for hive, otherwise it will fail while creating the Sampling Data. With larger data volume, it is not practical to sample the entire data set. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7642) Set hive input format by configuration.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092598#comment-14092598 ] Hive QA commented on HIVE-7642: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660946/HIVE-7642.2-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5856 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/27/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/27/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-27/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660946 Set hive input format by configuration.[Spark Branch] - Key: HIVE-7642 URL: https://issues.apache.org/jira/browse/HIVE-7642 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7642.1-spark.patch, HIVE-7642.2-spark.patch Currently hive input format is hard coded as HiveInputFormat, we should set this parameter from configuration. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092606#comment-14092606 ] Hive QA commented on HIVE-7624: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660957/HIVE-7624.5-spark.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/28/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/28/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-28/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-SPARK-Build-28/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-spark-source ]] + [[ ! -d apache-svn-spark-source/.svn ]] + [[ ! -d apache-svn-spark-source ]] + cd apache-svn-spark-source + svn revert -R . Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java' ++ svn status --no-ignore ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target cli/target odbc/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1617233. At revision 1617233. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12660957 Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at
[jira] [Commented] (HIVE-7142) Hive multi serialization encoding support
[ https://issues.apache.org/jira/browse/HIVE-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092610#comment-14092610 ] Chengxiang Li commented on HIVE-7142: - Hi, [~navis], this jira is trying to support table level configurable encoding, i take a look at HIVE-6329, do you mean you want to implement column level configurable encoding? If yes, that should be a quite different implementation. But that would be valuable as well, and i'm glad to see that. Hive multi serialization encoding support - Key: HIVE-7142 URL: https://issues.apache.org/jira/browse/HIVE-7142 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7142.1.patch.txt, HIVE-7142.2.patch, HIVE-7142.3.patch Currently Hive only support serialize data into UTF-8 charset bytes or deserialize from UTF-8 bytes, real world users may want to load different kinds of encoded data into hive directly. This jira is dedicated to support serialize/deserialize all kinds of encoded data in SerDe layer. For user, only need to configure serialization encoding on table level by set serialization encoding through serde parameter, for example: {code:sql} CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES(serialization.encoding='GBK'); {code} or {code:sql} ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); {code} LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property in this patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7675) Implement native HiveMapFunction
[ https://issues.apache.org/jira/browse/HIVE-7675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li reassigned HIVE-7675: --- Assignee: Chengxiang Li Implement native HiveMapFunction Key: HIVE-7675 URL: https://issues.apache.org/jira/browse/HIVE-7675 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Currently, Hive on Spark depend on ExecMapper to execute operator logic, full stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators. HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several problems as following: # ExecMapper is designed for MR single process task mode, it does not work well under Spark multi-thread task node. # ExecMapper introduce extra API level restriction and process logic. We need implement native HiveMapFunction, as the bridge between Spark framework and Hive operators. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup
[ https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092622#comment-14092622 ] Vaibhav Gumashta commented on HIVE-6847: I'll update the tests - errors seem related. The TestScratchDir tests are for the older HS2 scratch dir code. Improve / fix bugs in Hive scratch dir setup Key: HIVE-6847 URL: https://issues.apache.org/jira/browse/HIVE-6847 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch Currently, the hive server creates scratch directory and changes permission to 777 however, this is not great with respect to security. We need to create user specific scratch directories instead. Also refer to HIVE-6782 1st iteration of the patch for approach. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23320: HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23320/#review50174 --- service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java https://reviews.apache.org/r/23320/#comment87765 I'll move this back to HiveSessionImpl#open as this won't pick the doAs setting since open goes through the appropriate proxy (which has UGI.doAs). - Vaibhav Gumashta On Aug. 6, 2014, 4:11 p.m., Vaibhav Gumashta wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23320/ --- (Updated Aug. 6, 2014, 4:11 p.m.) Review request for hive, Navis Ryu, Sushanth Sowmyan, Szehon Ho, and Thejas Nair. Bugs: HIVE-7353 https://issues.apache.org/jira/browse/HIVE-7353 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-7353 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8490558 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ff282c5 metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 760777a ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ebf2443 service/src/java/org/apache/hive/service/cli/CLIService.java 80d7b82 service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java de54ca1 service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java b39d64d service/src/java/org/apache/hive/service/cli/session/SessionManager.java c2f0495 service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java b009a88 service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java be2eb01 service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 98d75b5 service/src/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java PRE-CREATION service/src/java/org/apache/hive/service/server/ThreadWithGarbageCleanup.java PRE-CREATION Diff: https://reviews.apache.org/r/23320/diff/ Testing --- Manual testing using Yourkit. Thanks, Vaibhav Gumashta
[jira] [Commented] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order
[ https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092649#comment-14092649 ] Hive QA commented on HIVE-7669: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660933/HIVE-7669.1.patch.txt {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5888 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/250/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/250/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-250/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660933 parallel order by clause on a string column fails with IOException: Split points are out of order - Key: HIVE-7669 URL: https://issues.apache.org/jira/browse/HIVE-7669 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor, SQL Affects Versions: 0.12.0 Environment: Hive 0.12.0-cdh5.0.0 OS: Redhat linux Reporter: Vishal Kamath Assignee: Navis Labels: orderby Attachments: HIVE-7669.1.patch.txt The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at
[jira] [Commented] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements
[ https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092739#comment-14092739 ] Hive QA commented on HIVE-4064: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660939/HIVE-4064.8.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5874 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/251/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/251/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-251/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660939 Handle db qualified names consistently across all HiveQL statements --- Key: HIVE-4064 URL: https://issues.apache.org/jira/browse/HIVE-4064 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Navis Attachments: HIVE-4064-1.patch, HIVE-4064.1.patch.txt, HIVE-4064.2.patch.txt, HIVE-4064.3.patch.txt, HIVE-4064.4.patch.txt, HIVE-4064.5.patch.txt, HIVE-4064.6.patch.txt, HIVE-4064.7.patch.txt, HIVE-4064.8.patch.txt Hive doesn't consistently handle db qualified names across all HiveQL statements. While some HiveQL statements such as SELECT support DB qualified names, other such as CREATE INDEX doesn't. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7648) authorization api should provide table/db object for create table/dbname
[ https://issues.apache.org/jira/browse/HIVE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092793#comment-14092793 ] Hive QA commented on HIVE-7648: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660937/HIVE-7648.1.patch {color:red}ERROR:{color} -1 due to 1074 failed/errored test(s), 5890 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketizedhiveinputformat_auto org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_case_sensitivity org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cast1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_join1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_nested_types org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_pad_convert org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_serde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_union1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_varchar_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_colstats_all_nulls org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_column_access_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnarserde_create_shortcut org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_partlvl org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_partlvl_dp org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_tbllvl org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_binary org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_boolean org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_decimal org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_double org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_empty_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_long org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_string org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_concatenate_inherit_table_location org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constprog_dp org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constprog_type org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_convert_enum_to_string org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer5
[jira] [Commented] (HIVE-7532) allow disabling direct sql per query with external metastore
[ https://issues.apache.org/jira/browse/HIVE-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092873#comment-14092873 ] Hive QA commented on HIVE-7532: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660945/HIVE-7532.6.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5889 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/253/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/253/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-253/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660945 allow disabling direct sql per query with external metastore Key: HIVE-7532 URL: https://issues.apache.org/jira/browse/HIVE-7532 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Navis Attachments: HIVE-7532.1.patch.txt, HIVE-7532.2.nogen, HIVE-7532.2.patch.txt, HIVE-7532.3.patch.txt, HIVE-7532.4.patch.txt, HIVE-7532.5.patch.txt, HIVE-7532.6.patch.txt Currently with external metastore, direct sql can only be disabled via metastore config globally. Perhaps it makes sense to have the ability to propagate the setting per query from client to override the metastore setting, e.g. if one particular query causes it to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4629: --- Assignee: Dong Chen (was: Shreepadma Venugopalan) HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Dong Chen Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, HIVE-4629.5.patch, HIVE-4629.6.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092876#comment-14092876 ] Brock Noland commented on HIVE-4629: Nice work [~dongc]!! [~thejas] [~cwsteinbach] you two had some good feedback on the earlier design. Can you take a look at the latest patch? [~romainr], I know Hue uses this API, do you want to take a look? HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Dong Chen Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, HIVE-4629.5.patch, HIVE-4629.6.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092900#comment-14092900 ] Brock Noland commented on HIVE-7624: Nice work!! bq. The patch does not appear to apply with p0, p1, or p2 Looks like the patch needs to be rebased. Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24445: HIVE-7642, Set hive input format by configuration.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24445/#review50178 --- Thank you so much! We can commit this very, soon. Just two small nits below and then we'll commit this. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java https://reviews.apache.org/r/24445/#comment87773 nit: Can the right hand side here can use StringUtils.isBlank so we can avoid the double negative? https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html#isBlank(java.lang.String) ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java https://reviews.apache.org/r/24445/#comment87774 nit: How about changing this to: String msg = Failed to load specified input format class: + inpFormat; LOG.error(msg, e); throw new HiveException(msg, e); which might provide better information to our users? - Brock Noland On Aug. 11, 2014, 7:45 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24445/ --- (Updated Aug. 11, 2014, 7:45 a.m.) Review request for hive, Brock Noland and Szehon Ho. Bugs: HIVE-7642 https://issues.apache.org/jira/browse/HIVE-7642 Repository: hive-git Description --- Currently hive input format is hard coded as HiveInputFormat, we should set this parameter from configuration. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 45eff67 Diff: https://reviews.apache.org/r/24445/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 24497: HIVE-7629 - Map joins between two parquet tables failing
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24497/#review50182 --- Thank you very much! Two comments below. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java https://reviews.apache.org/r/24497/#comment87783 nit: Missing space between if and ( ql/src/test/queries/clientpositive/parquet_join.q https://reviews.apache.org/r/24497/#comment87782 Can you add comments (start with --) which describe how this reproduces the bug? - Brock Noland On Aug. 8, 2014, 6:21 a.m., Suma Shivaprasad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24497/ --- (Updated Aug. 8, 2014, 6:21 a.m.) Review request for hive. Bugs: HIVE-7629 https://issues.apache.org/jira/browse/HIVE-7629 Repository: hive-git Description --- Map Joins between 2 parquet tables are failing since the Mapper is trying to access the columns of the first table(bigger table) while trying to load the second table(smaller map join table). Fixed this by adding a guard on the column indexes passed by hive Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 2f155f6 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java d6be4bd ql/src/test/queries/clientpositive/parquet_join.q PRE-CREATION ql/src/test/results/clientpositive/parquet_join.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24497/diff/ Testing --- parquet_join.q covers most types of joins between 2 parquet tables - Normal, Map join, SMB join Thanks, Suma Shivaprasad
[jira] [Commented] (HIVE-7160) Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported
[ https://issues.apache.org/jira/browse/HIVE-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092991#comment-14092991 ] Ashutosh Chauhan commented on HIVE-7160: I think design issue here is one raised in HIVE-7632 {{Vectorizer}} currently inserts casts and than evaluates it so that types of all operands match for UDF. It does so because currently Hive doesnt upcast operands while it does semantic checking and leave this to runtime where it is achieved, mainly via the logic in {{GenericUDFBaseNumeric}} Instead of delegating type casting to runtime, this should happen at compile time, when we are doing type checking and should upcast operands as necessary. Once we do this in {{TypeCheckProcFactory}} there will be no need to insert and evaluate cast later in compilation (like vectorizer) or runtime (GenericUDFOpNumeric) Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported -- Key: HIVE-7160 URL: https://issues.apache.org/jira/browse/HIVE-7160 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Navis Priority: Minor Attachments: HIVE-7160.1.patch.txt simple UDF missing vectorization - simple example would be hive explain select concat( l_orderkey, ' msecs') from lineitem; is not vectorized while hive explain select concat(cast(l_orderkey as string), ' msecs') from lineitem; can be vectorized. {code} 14/05/31 15:28:59 [main]: DEBUG vector.VectorizationContext: No vector udf found for GenericUDFConcat, descriptor: Argument Count = 2, mode = PROJECTION, Argument Types = {LONG, STRING}, Input Expression Types = {COLUMN,COLUMN} 14/05/31 15:28:59 [main]: DEBUG physical.Vectorizer: Failed to vectorize org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFConcat, is not supported at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:918) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Need help on HIVE-7653 (AvroSerde)
Hi, I submitted a patch for the following issue: https://issues.apache.org/jira/browse/HIVE-7653 But the build is failing due to some other issue. Its been failing for the past 70 builds or so and I don't think its related to my change. Also, my local build is passing. Can someone please help me override/fix this test-failure? Thanks Sachin
[jira] [Updated] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements
[ https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4064: Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks for the contribution [~navis]! This was some long pending cleanup! Handle db qualified names consistently across all HiveQL statements --- Key: HIVE-4064 URL: https://issues.apache.org/jira/browse/HIVE-4064 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-4064-1.patch, HIVE-4064.1.patch.txt, HIVE-4064.2.patch.txt, HIVE-4064.3.patch.txt, HIVE-4064.4.patch.txt, HIVE-4064.5.patch.txt, HIVE-4064.6.patch.txt, HIVE-4064.7.patch.txt, HIVE-4064.8.patch.txt Hive doesn't consistently handle db qualified names across all HiveQL statements. While some HiveQL statements such as SELECT support DB qualified names, other such as CREATE INDEX doesn't. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7678) add more test cases for tables qualified with database/schema name
[ https://issues.apache.org/jira/browse/HIVE-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7678: Description: HIVE-4064 fixed many cases where table names qualified with database names could not be used (eg db1.table1). The fix needs more test cases. add more test cases for tables qualified with database/schema name -- Key: HIVE-7678 URL: https://issues.apache.org/jira/browse/HIVE-7678 Project: Hive Issue Type: Bug Components: Tests Reporter: Thejas M Nair Assignee: Thejas M Nair HIVE-4064 fixed many cases where table names qualified with database names could not be used (eg db1.table1). The fix needs more test cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7678) add more test cases for tables qualified with database/schema name
Thejas M Nair created HIVE-7678: --- Summary: add more test cases for tables qualified with database/schema name Key: HIVE-7678 URL: https://issues.apache.org/jira/browse/HIVE-7678 Project: Hive Issue Type: Bug Components: Tests Reporter: Thejas M Nair Assignee: Thejas M Nair -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7658) Hive search order for hive-site.xml when using --config option
[ https://issues.apache.org/jira/browse/HIVE-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093032#comment-14093032 ] Venki Korukanti commented on HIVE-7658: --- Hive uses ClassLoader.getResource(hive-site.xml) for finding the path to hive-site.xml file. ClassLoader is retrieved using Thread.currentThread().getContextClassLoader() which returns a chain of class loaders. One of the ClassLoaders in the chain is sun.misc.Launcher$AppClassLoader. This particular ClassLoader treats the empty entry in ClassPath (example: /path/to/jar1.jar::/path/to/jar2) as current working directory of the process (see [here|https://community.oracle.com/thread/2456122?start=0tstart=0]). If you look at the classpath of the Hive process, there is one such empty entry after the hadoop jars and before hive conf dir and hive jars. As the empty entry is before hive conf directory, ClassLoader picks up the first occurrence of hive-site.xml in current working directory. Looking at the Hive scripts, adding empty path is an issue in hive scripts it self. Following line in {{bin/hive}} script causes an extra : before the hive constructed classpath when HADOOP_CLASSPATH is empty. hadoop scripts adds another : to its classpath and appends given hive classpath. {code} export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${CLASSPATH} {code} Hive search order for hive-site.xml when using --config option -- Key: HIVE-7658 URL: https://issues.apache.org/jira/browse/HIVE-7658 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga) Hive 0.13.0-mapr-1406 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 4ff8f8b4a8fc4862727108204399710ef7ee7abc Compiled by root on Tue Jul 1 14:18:09 PDT 2014 From source with checksum 208afc25260342b51aefd2e0edf4c9d6 Reporter: James Spurin Priority: Minor When using the hive cli, the tool appears to favour a hive-site.xml file in the current working directory even if the --config option is used with a valid directory containing a hive-site.xml file. I would have expected the directory specified with --config to take precedence in the CLASSPATH search order. Here's an example - /home/spurija/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example1/value /property /configuration /tmp/hive/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example2/value /property /configuration -bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml 23c23 value/tmp/example1/value --- value/tmp/example2/value { check the value of scratchdir, should be example 1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { run with a specified config, check the value of scratchdir, should be example2 … still reported as example1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { remove the local config, check the value of scratchdir, should be example2 … now correct } -bash-4.1$ pwd /home/spurija -bash-4.1$ rm hive-site.xml -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example2 Is this expected behavior or should it use the directory supplied with --config as the preferred configuration? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7541) Support union all on Spark
[ https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Na Yang updated HIVE-7541: -- Attachment: HIVE-7541.2-spark.patch Support union all on Spark -- Key: HIVE-7541 URL: https://issues.apache.org/jira/browse/HIVE-7541 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Na Yang Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, Hive on Spark Union All design.pdf For union all operator, we will use Spark's union transformation. Refer to the design doc on wiki for more information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7658) Hive search order for hive-site.xml when using --config option
[ https://issues.apache.org/jira/browse/HIVE-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated HIVE-7658: -- Attachment: HIVE-7658.1.patch Attached patch resolves the issue by checking whether HADOOP_CLASSPATH is non-empty before using it. Hive search order for hive-site.xml when using --config option -- Key: HIVE-7658 URL: https://issues.apache.org/jira/browse/HIVE-7658 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga) Hive 0.13.0-mapr-1406 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 4ff8f8b4a8fc4862727108204399710ef7ee7abc Compiled by root on Tue Jul 1 14:18:09 PDT 2014 From source with checksum 208afc25260342b51aefd2e0edf4c9d6 Reporter: James Spurin Priority: Minor Attachments: HIVE-7658.1.patch When using the hive cli, the tool appears to favour a hive-site.xml file in the current working directory even if the --config option is used with a valid directory containing a hive-site.xml file. I would have expected the directory specified with --config to take precedence in the CLASSPATH search order. Here's an example - /home/spurija/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example1/value /property /configuration /tmp/hive/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example2/value /property /configuration -bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml 23c23 value/tmp/example1/value --- value/tmp/example2/value { check the value of scratchdir, should be example 1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { run with a specified config, check the value of scratchdir, should be example2 … still reported as example1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { remove the local config, check the value of scratchdir, should be example2 … now correct } -bash-4.1$ pwd /home/spurija -bash-4.1$ rm hive-site.xml -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example2 Is this expected behavior or should it use the directory supplied with --config as the preferred configuration? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7658) Hive search order for hive-site.xml when using --config option
[ https://issues.apache.org/jira/browse/HIVE-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti reassigned HIVE-7658: - Assignee: Venki Korukanti Hive search order for hive-site.xml when using --config option -- Key: HIVE-7658 URL: https://issues.apache.org/jira/browse/HIVE-7658 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga) Hive 0.13.0-mapr-1406 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 4ff8f8b4a8fc4862727108204399710ef7ee7abc Compiled by root on Tue Jul 1 14:18:09 PDT 2014 From source with checksum 208afc25260342b51aefd2e0edf4c9d6 Reporter: James Spurin Assignee: Venki Korukanti Priority: Minor Attachments: HIVE-7658.1.patch When using the hive cli, the tool appears to favour a hive-site.xml file in the current working directory even if the --config option is used with a valid directory containing a hive-site.xml file. I would have expected the directory specified with --config to take precedence in the CLASSPATH search order. Here's an example - /home/spurija/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example1/value /property /configuration /tmp/hive/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example2/value /property /configuration -bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml 23c23 value/tmp/example1/value --- value/tmp/example2/value { check the value of scratchdir, should be example 1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { run with a specified config, check the value of scratchdir, should be example2 … still reported as example1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { remove the local config, check the value of scratchdir, should be example2 … now correct } -bash-4.1$ pwd /home/spurija -bash-4.1$ rm hive-site.xml -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example2 Is this expected behavior or should it use the directory supplied with --config as the preferred configuration? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7541) Support union all on Spark
[ https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093038#comment-14093038 ] Na Yang commented on HIVE-7541: --- Hi Szehon, Thank you for the comments. Please review the new patch. Thanks, Na Support union all on Spark -- Key: HIVE-7541 URL: https://issues.apache.org/jira/browse/HIVE-7541 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Na Yang Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, Hive on Spark Union All design.pdf For union all operator, we will use Spark's union transformation. Refer to the design doc on wiki for more information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7646) Modify parser to support new grammar for Insert,Update,Delete
[ https://issues.apache.org/jira/browse/HIVE-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7646: - Attachment: delete.patch changes to parse DELETE. Modify parser to support new grammar for Insert,Update,Delete - Key: HIVE-7646 URL: https://issues.apache.org/jira/browse/HIVE-7646 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: delete.patch need parser to recognize constructs such as INSERT INTO Cust (Customer_Number, Balance, Address) VALUES (101, 50.00, '123 Main Street'), (102, 75.00, '123 Pine Ave'); DELETE FROM Cust WHERE Balance 5.0; UPDATE Cust SET column1=value1,column2=value2,... WHERE some_column=some_value; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7142) Hive multi serialization encoding support
[ https://issues.apache.org/jira/browse/HIVE-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093069#comment-14093069 ] Hive QA commented on HIVE-7142: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660943/HIVE-7142.3.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5873 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/254/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/254/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-254/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660943 Hive multi serialization encoding support - Key: HIVE-7142 URL: https://issues.apache.org/jira/browse/HIVE-7142 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7142.1.patch.txt, HIVE-7142.2.patch, HIVE-7142.3.patch Currently Hive only support serialize data into UTF-8 charset bytes or deserialize from UTF-8 bytes, real world users may want to load different kinds of encoded data into hive directly. This jira is dedicated to support serialize/deserialize all kinds of encoded data in SerDe layer. For user, only need to configure serialization encoding on table level by set serialization encoding through serde parameter, for example: {code:sql} CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES(serialization.encoding='GBK'); {code} or {code:sql} ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); {code} LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property in this patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7541) Support union all on Spark
[ https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Na Yang updated HIVE-7541: -- Attachment: (was: HIVE-7541.2-spark.patch) Support union all on Spark -- Key: HIVE-7541 URL: https://issues.apache.org/jira/browse/HIVE-7541 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Na Yang Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, Hive on Spark Union All design.pdf For union all operator, we will use Spark's union transformation. Refer to the design doc on wiki for more information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7541) Support union all on Spark
[ https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Na Yang updated HIVE-7541: -- Attachment: HIVE-7541.2-spark.patch Support union all on Spark -- Key: HIVE-7541 URL: https://issues.apache.org/jira/browse/HIVE-7541 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Na Yang Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, Hive on Spark Union All design.pdf For union all operator, we will use Spark's union transformation. Refer to the design doc on wiki for more information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4806) Add more implementations of JDBC API methods to Hive and Hive2 drivers
[ https://issues.apache.org/jira/browse/HIVE-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093076#comment-14093076 ] Hive QA commented on HIVE-4806: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12590514/HIVE-4806.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/255/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/255/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-255/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-255/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/DelimitedJSONSerDe.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java' Reverted 'serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Umetastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java Umetastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java Umetastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java Umetastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java Umetastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java Uql/src/java/org/apache/hadoop/hive/ql/plan/ShowGrantDesc.java Uql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableAlterPartDesc.java Uql/src/java/org/apache/hadoop/hive/ql/plan/AlterIndexDesc.java Uql/src/java/org/apache/hadoop/hive/ql/plan/PrivilegeObjectDesc.java Uql/src/java/org/apache/hadoop/hive/ql/plan/RenamePartitionDesc.java Uql/src/java/org/apache/hadoop/hive/ql/plan/ShowColumnsDesc.java Uql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableSimpleDesc.java Uql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessInfo.java Uql/src/java/org/apache/hadoop/hive/ql/parse/IndexUpdater.java Uql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java Uql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java U ql/src/java/org/apache/hadoop/hive/ql/parse/authorization/HiveAuthorizationTaskFactoryImpl.java U
[jira] [Commented] (HIVE-7651) Investigate why union two RDDs generated from two MapTrans does not get the right result
[ https://issues.apache.org/jira/browse/HIVE-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093083#comment-14093083 ] Na Yang commented on HIVE-7651: --- This issue was caused by a single jobConf instance are used by multiple MapTrans. The fix is included in the patch of HIVE-7541. Investigate why union two RDDs generated from two MapTrans does not get the right result Key: HIVE-7651 URL: https://issues.apache.org/jira/browse/HIVE-7651 Project: Hive Issue Type: Bug Components: Spark Reporter: Na Yang If the SparkWork has two map works as root, then use the current generate(basework) API to generate two mapTran. union the RDDs processed by the two mapTrans does not generate the correct result. If two input RDDs come from different data tables, then the union result is empty. if two input RDDs come from the same data table, then the union result is not correct. The same row of data happen 4 times in the union result. Need to investigate why this happen and how to fix it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7651) Investigate why union two RDDs generated from two MapTrans does not get the right result
[ https://issues.apache.org/jira/browse/HIVE-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Na Yang resolved HIVE-7651. --- Resolution: Implemented Assignee: Na Yang Investigate why union two RDDs generated from two MapTrans does not get the right result Key: HIVE-7651 URL: https://issues.apache.org/jira/browse/HIVE-7651 Project: Hive Issue Type: Bug Components: Spark Reporter: Na Yang Assignee: Na Yang If the SparkWork has two map works as root, then use the current generate(basework) API to generate two mapTran. union the RDDs processed by the two mapTrans does not generate the correct result. If two input RDDs come from different data tables, then the union result is empty. if two input RDDs come from the same data table, then the union result is not correct. The same row of data happen 4 times in the union result. Need to investigate why this happen and how to fix it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4806) Add more implementations of JDBC API methods to Hive and Hive2 drivers
[ https://issues.apache.org/jira/browse/HIVE-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093105#comment-14093105 ] Alexander Pivovarov commented on HIVE-4806: --- I found at least 2 issues with HIVE-4806.patch 1. getIdentifierQuoteString returns ' Actually there is a limited support for quoting identifiers in hive. You can quote column names but not database or table names. It means that IdentifierQuoteString is not fully supported and most probably getIdentifierQuoteString should return space (according to JDBC spec). This method returns a space if identifier quoting is not supported. In this case SQL client will generate correct sql statements (without quotes for column, table and database names). 2. isReadOnly returns true. Also method description says Returns a true as the database meta data is readonly. in fact JDBC spec defines this method as Retrieves whether this database is in read-only mode.So, it's about database but not about metadata. In most cases hive databases are NOT readonly. We can run create table as select, insert into table, insert overwrite I think isReadOnly should return false. Look at my patch HIVE-7676 Add more implementations of JDBC API methods to Hive and Hive2 drivers -- Key: HIVE-4806 URL: https://issues.apache.org/jira/browse/HIVE-4806 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.11.0 Reporter: Matt Burgess Assignee: Matt Burgess Attachments: HIVE-4806.patch Third-party client software such as Pentaho Data Integration (PDI) uses many different JDBC API calls when interacting with JDBC data sources. Several of these calls have not yet been implemented in the Hive and Hive 2 drivers and by default will throw Method not supported SQLExceptions when there could be default implementations instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7679) JOIN operator should update the column stats when number of rows changes
Prasanth J created HIVE-7679: Summary: JOIN operator should update the column stats when number of rows changes Key: HIVE-7679 URL: https://issues.apache.org/jira/browse/HIVE-7679 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor JOIN operator does not update the column stats when the number of rows changes. All other operators scales up/down the column statistics when the number of rows changes. Same should be done for JOIN operator as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7679) JOIN operator should update the column stats when number of rows changes
[ https://issues.apache.org/jira/browse/HIVE-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7679: - Description: JOIN operator does not update the column stats when the number of rows changes. All other operators scales up/down the column statistics when the number of rows changes. Same should be done for JOIN operator as well. Because of this dataSize might become negative as numNulls can get bigger than numRows (if scaling down of column stats is not done). (was: JOIN operator does not update the column stats when the number of rows changes. All other operators scales up/down the column statistics when the number of rows changes. Same should be done for JOIN operator as well. ) JOIN operator should update the column stats when number of rows changes Key: HIVE-7679 URL: https://issues.apache.org/jira/browse/HIVE-7679 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Fix For: 0.13.0 JOIN operator does not update the column stats when the number of rows changes. All other operators scales up/down the column statistics when the number of rows changes. Same should be done for JOIN operator as well. Because of this dataSize might become negative as numNulls can get bigger than numRows (if scaling down of column stats is not done). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7366) getDatabase using direct sql
[ https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093124#comment-14093124 ] Sergey Shelukhin commented on HIVE-7366: The comment correction for isConfigEnabled is not quite correct, we still use it in tx-es if enabled. That code checks 2 config settings. Can you post an RB? getDatabase using direct sql Key: HIVE-7366 URL: https://issues.apache.org/jira/browse/HIVE-7366 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-7366.patch Given that get_database is easily one of the most frequent calls made on the metastore, we should have the ability to bypass datanucleus for that, and use direct SQL instead. This was something that I did initially as part of debugging HIVE-7368, but I think that given the frequency of this call, it's useful to have it in mainline direct sql. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization
[ https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093129#comment-14093129 ] Yin Huai commented on HIVE-7205: Yeah, fixing correctness bug is very important. However, the current patch also introduces a significant refactoring of the query evaluation path. I am not sure if this refactoring will not break other things. [~navis] Can you post a summary of how those operators work with your refactoring? Wrong results when union all of grouping followed by group by with correlation optimization --- Key: HIVE-7205 URL: https://issues.apache.org/jira/browse/HIVE-7205 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: dima machlin Assignee: Navis Priority: Critical Attachments: HIVE-7205.1.patch.txt, HIVE-7205.2.patch.txt, HIVE-7205.3.patch.txt use case : table TBL (a string,b string) contains single row : 'a','a' the following query : {code:sql} select b, sum(cc) from ( select b,count(1) as cc from TBL group by b union all select a as b,count(1) as cc from TBL group by a ) z group by b {code} returns a 1 a 1 while set hive.optimize.correlation=true; if we change set hive.optimize.correlation=false; it returns correct results : a 2 The plan with correlation optimization : {code:sql} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: null-subquery1:z-subquery1:TBL TableScan alias: TBL Select Operator expressions: expr: b type: string outputColumnNames: b Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: b type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 0 value expressions: expr: _col1 type: bigint null-subquery2:z-subquery2:TBL TableScan alias: TBL Select Operator expressions: expr: a type: string outputColumnNames: a Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: a type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 1 value expressions: expr: _col1 type: bigint Reduce Operator Tree: Demux Operator Group By Operator aggregations: expr: count(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string
[jira] [Commented] (HIVE-6486) Support secure Subject.doAs() in HiveServer2 JDBC client.
[ https://issues.apache.org/jira/browse/HIVE-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093143#comment-14093143 ] Shivaraju Gowda commented on HIVE-6486: --- Should that closing curly bracket be included in the doc? Yes, that would help by making the method self-contained. Thanks for checking and documenting it. Support secure Subject.doAs() in HiveServer2 JDBC client. - Key: HIVE-6486 URL: https://issues.apache.org/jira/browse/HIVE-6486 Project: Hive Issue Type: Improvement Components: Authentication, HiveServer2, JDBC Affects Versions: 0.11.0, 0.12.0 Reporter: Shivaraju Gowda Assignee: Shivaraju Gowda Fix For: 0.13.0 Attachments: HIVE-6486.1.patch, HIVE-6486.2.patch, HIVE-6486.3.patch, HIVE-6486_Hive0.11.patch, TestCase_HIVE-6486.java HIVE-5155 addresses the problem of kerberos authentication in multi-user middleware server using proxy user. In this mode the principal used by the middle ware server has privileges to impersonate selected users in Hive/Hadoop. This enhancement is to support Subject.doAs() authentication in Hive JDBC layer so that the end users Kerberos Subject is passed through in the middle ware server. With this improvement there won't be any additional setup in the server to grant proxy privileges to some users and there won't be need to specify a proxy user in the JDBC client. This version should also be more secure since it won't require principals with the privileges to impersonate other users in Hive/Hadoop setup. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7366) getDatabase using direct sql
[ https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-7366: --- Status: Open (was: Patch Available) Unsetting patch-available, since some of the errors reported are relevant to this patch. Looking into it. getDatabase using direct sql Key: HIVE-7366 URL: https://issues.apache.org/jira/browse/HIVE-7366 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-7366.patch Given that get_database is easily one of the most frequent calls made on the metastore, we should have the ability to bypass datanucleus for that, and use direct SQL instead. This was something that I did initially as part of debugging HIVE-7368, but I think that given the frequency of this call, it's useful to have it in mainline direct sql. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-7616: --- Attachment: HIVE-7616.07.patch fix forgotten test output pre-size mapjoin hashtable based on statistics -- Key: HIVE-7616 URL: https://issues.apache.org/jira/browse/HIVE-7616 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, HIVE-7616.06.patch, HIVE-7616.07.patch, HIVE-7616.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7366) getDatabase using direct sql
[ https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093156#comment-14093156 ] Sushanth Sowmyan commented on HIVE-7366: Will do. I still need to update the patch a bit, and will upload it to rb with that. getDatabase using direct sql Key: HIVE-7366 URL: https://issues.apache.org/jira/browse/HIVE-7366 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-7366.patch Given that get_database is easily one of the most frequent calls made on the metastore, we should have the ability to bypass datanucleus for that, and use direct SQL instead. This was something that I did initially as part of debugging HIVE-7368, but I think that given the frequency of this call, it's useful to have it in mainline direct sql. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7532) allow disabling direct sql per query with external metastore
[ https://issues.apache.org/jira/browse/HIVE-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093157#comment-14093157 ] Sergey Shelukhin commented on HIVE-7532: +1 allow disabling direct sql per query with external metastore Key: HIVE-7532 URL: https://issues.apache.org/jira/browse/HIVE-7532 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Navis Attachments: HIVE-7532.1.patch.txt, HIVE-7532.2.nogen, HIVE-7532.2.patch.txt, HIVE-7532.3.patch.txt, HIVE-7532.4.patch.txt, HIVE-7532.5.patch.txt, HIVE-7532.6.patch.txt Currently with external metastore, direct sql can only be disabled via metastore config globally. Perhaps it makes sense to have the ability to propagate the setting per query from client to override the metastore setting, e.g. if one particular query causes it to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7541) Support union all on Spark
[ https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093168#comment-14093168 ] Szehon Ho commented on HIVE-7541: - Thanks , can you upload the new patch to the review board too so its easier to look ? Support union all on Spark -- Key: HIVE-7541 URL: https://issues.apache.org/jira/browse/HIVE-7541 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Na Yang Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, Hive on Spark Union All design.pdf For union all operator, we will use Spark's union transformation. Refer to the design doc on wiki for more information. -- This message was sent by Atlassian JIRA (v6.2#6252)
ArrayWritableGroupConverter
Hi, I was just wondering how come the field count has to be either 1 or 2? I'm trying to read a column where the amount is fields is 3 and I'm getting an invalid parquet hive schema (in hive 0.12) error when I try to do so. It looks like it links back to here. *https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java* Thanks, -Raymond
[jira] [Updated] (HIVE-7658) Hive search order for hive-site.xml when using --config option
[ https://issues.apache.org/jira/browse/HIVE-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated HIVE-7658: -- Status: Patch Available (was: Open) Hive search order for hive-site.xml when using --config option -- Key: HIVE-7658 URL: https://issues.apache.org/jira/browse/HIVE-7658 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga) Hive 0.13.0-mapr-1406 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 4ff8f8b4a8fc4862727108204399710ef7ee7abc Compiled by root on Tue Jul 1 14:18:09 PDT 2014 From source with checksum 208afc25260342b51aefd2e0edf4c9d6 Reporter: James Spurin Assignee: Venki Korukanti Priority: Minor Attachments: HIVE-7658.1.patch When using the hive cli, the tool appears to favour a hive-site.xml file in the current working directory even if the --config option is used with a valid directory containing a hive-site.xml file. I would have expected the directory specified with --config to take precedence in the CLASSPATH search order. Here's an example - /home/spurija/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example1/value /property /configuration /tmp/hive/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example2/value /property /configuration -bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml 23c23 value/tmp/example1/value --- value/tmp/example2/value { check the value of scratchdir, should be example 1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { run with a specified config, check the value of scratchdir, should be example2 … still reported as example1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { remove the local config, check the value of scratchdir, should be example2 … now correct } -bash-4.1$ pwd /home/spurija -bash-4.1$ rm hive-site.xml -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example2 Is this expected behavior or should it use the directory supplied with --config as the preferred configuration? -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24377: HIVE-7142 Hive multi serialization encoding support
On Aug. 11, 2014, 4:52 a.m., Brock Noland wrote: serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java, line 43 https://reviews.apache.org/r/24377/diff/3/?file=653662#file653662line43 Can we make these constants? serialization.encoding is probably already available somewhere. chengxiang li wrote: add serialization.encoding to serdeConstant class if that's what you mean here. That file is auto-generated. In order to add the constant their, you'll have to edit serde/if/serde.thrift and then re-generate. - Brock --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24377/#review50145 --- On Aug. 11, 2014, 7:30 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24377/ --- (Updated Aug. 11, 2014, 7:30 a.m.) Review request for hive. Bugs: HIVE-7142 https://issues.apache.org/jira/browse/HIVE-7142 Repository: hive-git Description --- Currently Hive only support serialize data into UTF-8 charset bytes or deserialize from UTF-8 bytes, real world users may want to load different kinds of encoded data into hive directly. This jira is dedicated to support serialize/deserialize all kinds of encoded data in SerDe layer. For user, only need to configure serialization encoding on table level by set serialization encoding through serde parameter, for example: CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES(serialization.encoding='GBK'); or ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property in this patch. Diffs - serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java 515cf25 serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/DelimitedJSONSerDe.java 179f9b5 serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java b7fb048 serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java fb55c70 Diff: https://reviews.apache.org/r/24377/diff/ Testing --- Thanks, chengxiang li
[jira] [Updated] (HIVE-7678) add more test cases for tables qualified with database/schema name
[ https://issues.apache.org/jira/browse/HIVE-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7678: Attachment: HIVE-7678.1.patch add more test cases for tables qualified with database/schema name -- Key: HIVE-7678 URL: https://issues.apache.org/jira/browse/HIVE-7678 Project: Hive Issue Type: Bug Components: Tests Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7678.1.patch HIVE-4064 fixed many cases where table names qualified with database names could not be used (eg db1.table1). The fix needs more test cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7678) add more test cases for tables qualified with database/schema name
[ https://issues.apache.org/jira/browse/HIVE-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7678: Status: Patch Available (was: Open) add more test cases for tables qualified with database/schema name -- Key: HIVE-7678 URL: https://issues.apache.org/jira/browse/HIVE-7678 Project: Hive Issue Type: Bug Components: Tests Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7678.1.patch HIVE-4064 fixed many cases where table names qualified with database names could not be used (eg db1.table1). The fix needs more test cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7678) add more test cases for tables qualified with database/schema name
[ https://issues.apache.org/jira/browse/HIVE-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093201#comment-14093201 ] Thejas M Nair commented on HIVE-7678: - Added test cases for 'show partition', 'show table properties', msck. But I found parse issues in several alter table commands when qualified table names are used, I will open another jira for that. add more test cases for tables qualified with database/schema name -- Key: HIVE-7678 URL: https://issues.apache.org/jira/browse/HIVE-7678 Project: Hive Issue Type: Bug Components: Tests Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7678.1.patch HIVE-4064 fixed many cases where table names qualified with database names could not be used (eg db1.table1). The fix needs more test cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7678) add more test cases for tables qualified with database/schema name
[ https://issues.apache.org/jira/browse/HIVE-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093203#comment-14093203 ] Thejas M Nair commented on HIVE-7678: - Some of the tests are taken from HIVE-3589. add more test cases for tables qualified with database/schema name -- Key: HIVE-7678 URL: https://issues.apache.org/jira/browse/HIVE-7678 Project: Hive Issue Type: Bug Components: Tests Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7678.1.patch HIVE-4064 fixed many cases where table names qualified with database names could not be used (eg db1.table1). The fix needs more test cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7680) Do not throw SQLException for HiveStatement getMoreResults and setEscapeProcessing(false)
Alexander Pivovarov created HIVE-7680: - Summary: Do not throw SQLException for HiveStatement getMoreResults and setEscapeProcessing(false) Key: HIVE-7680 URL: https://issues.apache.org/jira/browse/HIVE-7680 Project: Hive Issue Type: Bug Components: JDBC Reporter: Alexander Pivovarov Priority: Minor 1. Some JDBC clients call method setEscapeProcessing(false) (e.g. SQL Workbench) Looks like setEscapeProcessing(false) should do nothing.So, lets do nothing instead of throwing SQLException 2. getMoreResults is needed in case Statements returns several ReseltSet. Hive does not support Multiple ResultSets. So this method can safely always return false. 3. getUpdateCount. Currently this method always returns 0. Hive cannot tell us how many rows were inserted. According to JDBC spec it should return -1 if the current result is a ResultSet object or there are no more results if this method returns 0 then in case of execution insert statement JDBC client shows 0 rows were inserted which is not true. if this method returns -1 then JDBC client runs insert statements and shows that it was executed successfully, no result were returned. I think the latter behaviour is more correct. 4. Some methods in Statement class should throw SQLFeatureNotSupportedException if they are not supported. Current implementation throws SQLException instead which means database access error. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7681) qualified tablenames usage does not work with several alter-table commands
Thejas M Nair created HIVE-7681: --- Summary: qualified tablenames usage does not work with several alter-table commands Key: HIVE-7681 URL: https://issues.apache.org/jira/browse/HIVE-7681 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Changes were made in HIVE-4064 for use of qualified table names in more types of queries. But several alter table commands don't work with qualified - alter table default.tmpfoo set tblproperties (bar = bar value) - ALTER TABLE default.kv_rename_test CHANGE a a STRING - add,drop partition - alter index rebuild -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7627: --- Description: Hive table statistic failed on FSStatsPublisher mode, with the following exception in Spark executor side: {noformat} 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy19.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Updated] (HIVE-7680) Do not throw SQLException for HiveStatement getMoreResults and setEscapeProcessing(false)
[ https://issues.apache.org/jira/browse/HIVE-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7680: -- Attachment: HIVE-7680.patch Do not throw SQLException for HiveStatement getMoreResults and setEscapeProcessing(false) - Key: HIVE-7680 URL: https://issues.apache.org/jira/browse/HIVE-7680 Project: Hive Issue Type: Bug Components: JDBC Reporter: Alexander Pivovarov Priority: Minor Attachments: HIVE-7680.patch 1. Some JDBC clients call method setEscapeProcessing(false) (e.g. SQL Workbench) Looks like setEscapeProcessing(false) should do nothing.So, lets do nothing instead of throwing SQLException 2. getMoreResults is needed in case Statements returns several ReseltSet. Hive does not support Multiple ResultSets. So this method can safely always return false. 3. getUpdateCount. Currently this method always returns 0. Hive cannot tell us how many rows were inserted. According to JDBC spec it should return -1 if the current result is a ResultSet object or there are no more results if this method returns 0 then in case of execution insert statement JDBC client shows 0 rows were inserted which is not true. if this method returns -1 then JDBC client runs insert statements and shows that it was executed successfully, no result were returned. I think the latter behaviour is more correct. 4. Some methods in Statement class should throw SQLFeatureNotSupportedException if they are not supported. Current implementation throws SQLException instead which means database access error. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7680) Do not throw SQLException for HiveStatement getMoreResults and setEscapeProcessing(false)
[ https://issues.apache.org/jira/browse/HIVE-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7680: -- Status: Patch Available (was: Open) Do not throw SQLException for HiveStatement getMoreResults and setEscapeProcessing(false) - Key: HIVE-7680 URL: https://issues.apache.org/jira/browse/HIVE-7680 Project: Hive Issue Type: Bug Components: JDBC Reporter: Alexander Pivovarov Priority: Minor Attachments: HIVE-7680.patch 1. Some JDBC clients call method setEscapeProcessing(false) (e.g. SQL Workbench) Looks like setEscapeProcessing(false) should do nothing.So, lets do nothing instead of throwing SQLException 2. getMoreResults is needed in case Statements returns several ReseltSet. Hive does not support Multiple ResultSets. So this method can safely always return false. 3. getUpdateCount. Currently this method always returns 0. Hive cannot tell us how many rows were inserted. According to JDBC spec it should return -1 if the current result is a ResultSet object or there are no more results if this method returns 0 then in case of execution insert statement JDBC client shows 0 rows were inserted which is not true. if this method returns -1 then JDBC client runs insert statements and shows that it was executed successfully, no result were returned. I think the latter behaviour is more correct. 4. Some methods in Statement class should throw SQLFeatureNotSupportedException if they are not supported. Current implementation throws SQLException instead which means database access error. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7676) Support more methods in HiveDatabaseMetaData
[ https://issues.apache.org/jira/browse/HIVE-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7676: -- Status: Patch Available (was: Open) Support more methods in HiveDatabaseMetaData Key: HIVE-7676 URL: https://issues.apache.org/jira/browse/HIVE-7676 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Alexander Pivovarov Attachments: HIVE-7676.patch I noticed that some methods in HiveDatabaseMetaData throws exceptions instead of returning true/false. Many JDBC clients expects implementations for particular methods in order to work. E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions returns true. Also hive 0.14.0 supports UNION ALL and does not support UNION We can indicate this in HiveDatabaseMetaData instead of throwing Method Not supported exception. getIdentifierQuoteString should return space if not supported. http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7366) getDatabase using direct sql
[ https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-7366: --- Attachment: HIVE-7366.2.patch Updated patch to fix test failures - the test failures were due to my change not taking into account recent role/owner changes. getDatabase using direct sql Key: HIVE-7366 URL: https://issues.apache.org/jira/browse/HIVE-7366 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-7366.2.patch, HIVE-7366.patch Given that get_database is easily one of the most frequent calls made on the metastore, we should have the ability to bypass datanucleus for that, and use direct SQL instead. This was something that I did initially as part of debugging HIVE-7368, but I think that given the frequency of this call, it's useful to have it in mainline direct sql. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7366) getDatabase using direct sql
[ https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093230#comment-14093230 ] Sushanth Sowmyan commented on HIVE-7366: [~sershe], I've created a reviewboard link for the latest patch : https://reviews.apache.org/r/24574/ getDatabase using direct sql Key: HIVE-7366 URL: https://issues.apache.org/jira/browse/HIVE-7366 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-7366.2.patch, HIVE-7366.patch Given that get_database is easily one of the most frequent calls made on the metastore, we should have the ability to bypass datanucleus for that, and use direct SQL instead. This was something that I did initially as part of debugging HIVE-7368, but I think that given the frequency of this call, it's useful to have it in mainline direct sql. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7676) Support more methods in HiveDatabaseMetaData
[ https://issues.apache.org/jira/browse/HIVE-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093234#comment-14093234 ] Alexander Pivovarov commented on HIVE-7676: --- patch content: 1. getIdentifierQuoteString returns space 2. getIndexInfo returns empty ResultSet similar to getPrimaryKeys 3.1 supportsFullOuterJoins = true 3.2 supportsLimitedOuterJoins = true 4.1 supportsSchemasInDataManipulation = true 4.2 supportsSchemasInTableDefinitions = true 5.1 supportsUnion = false 5.2 supportsUnionAll = true 6. HiveResultSetMetaData.isReadOnly = true Support more methods in HiveDatabaseMetaData Key: HIVE-7676 URL: https://issues.apache.org/jira/browse/HIVE-7676 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.13.1 Reporter: Alexander Pivovarov Attachments: HIVE-7676.patch I noticed that some methods in HiveDatabaseMetaData throws exceptions instead of returning true/false. Many JDBC clients expects implementations for particular methods in order to work. E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions returns true. Also hive 0.14.0 supports UNION ALL and does not support UNION We can indicate this in HiveDatabaseMetaData instead of throwing Method Not supported exception. getIdentifierQuoteString should return space if not supported. http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7682) HadoopThriftAuthBridge20S should not reset configuration unless required
Brock Noland created HIVE-7682: -- Summary: HadoopThriftAuthBridge20S should not reset configuration unless required Key: HIVE-7682 URL: https://issues.apache.org/jira/browse/HIVE-7682 Project: Hive Issue Type: Bug Reporter: Brock Noland In HadoopThriftAuthBridge20S methods createClientWithConf and getCurrentUGIWithConf we create new Configuration objects so we can set the authentication type. When loading the new Configuration object, it looks like core-site.xml for the cluster it's connected to. This causes issues for Oozie since oozie does not have access to the core-site.xml as it's cluster agnostic. -- This message was sent by Atlassian JIRA (v6.2#6252)