[jira] [Created] (HIVE-9920) DROP DATABASE IF EXISTS throws exception if database does not exist
Chaoyu Tang created HIVE-9920: - Summary: DROP DATABASE IF EXISTS throws exception if database does not exist Key: HIVE-9920 URL: https://issues.apache.org/jira/browse/HIVE-9920 Project: Hive Issue Type: Bug Components: Logging, Metastore Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor drop database if exists noexistingdb throws and logs full exception if the database (noexistingdb) does not exist: 15/03/10 22:47:22 WARN metastore.ObjectStore: Failed to get database statsdb2, returning NoSuchObjectException 15/03/11 00:19:55 ERROR metastore.RetryingHMSHandler: NoSuchObjectException(message:statsdb2) at org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:569) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98) at com.sun.proxy.$Proxy6.getDatabase(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:953) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:927) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy8.get_database(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1150) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91) at com.sun.proxy.$Proxy9.getDatabase(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1291) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:1364) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDropDatabase(DDLSemanticAnalyzer.java:777) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:427) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:425) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1116) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1164) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1053) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1043) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9975) Renaming a nonexisting partition should not throw out NullPointerException
Chaoyu Tang created HIVE-9975: - Summary: Renaming a nonexisting partition should not throw out NullPointerException Key: HIVE-9975 URL: https://issues.apache.org/jira/browse/HIVE-9975 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Renaming a nonexisting partition should not throw out NullPointerException. create table testpart (col1 int, col2 string, col3 string) partitioned by (part string); alter table testpart partition (part = 'nonexisting') rename to partition (part = 'existing'); we get NPE like following: {code} 15/03/16 10:16:11 ERROR exec.DDLTask: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.DDLTask.renamePartition(DDLTask.java:944) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:350) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1642) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1402) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1187) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1053) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1043) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. null {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10007) Support qualified table name in analyze table compute statistics for columns
Chaoyu Tang created HIVE-10007: -- Summary: Support qualified table name in analyze table compute statistics for columns Key: HIVE-10007 URL: https://issues.apache.org/jira/browse/HIVE-10007 Project: Hive Issue Type: Improvement Components: Query Processor, Statistics Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Currently "analyze table compute statistics for columns" command can not compute column stats for a table in a different database since it does not support qualified table name. You need switch to that table database in order to compute its column stats. For example, you have to "use psqljira", then "analyze table src compute statistics for columns" for the table src under psqljira. This JIRA will provide the support to qualified table name in analyze column stats command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10210) Compute partition column stats fails when partition value is zero-leading integer
Chaoyu Tang created HIVE-10210: -- Summary: Compute partition column stats fails when partition value is zero-leading integer Key: HIVE-10210 URL: https://issues.apache.org/jira/browse/HIVE-10210 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang The command "Analyze table .. partition compute statistics for columns" fails if the partition value is not a normalize integer with leading zeros. For example: create table colstatspartint (key int, value string) partitioned by (part int); insert into colstatspartint partition (part='0003') select key, value from src limit 30; analyze table colstatspartint partition (part='0003') compute statistics for columns; or analyze table colstatspartint partition (part=0003) compute statistics for columns; you will get the error: {code} 15/04/03 10:13:19 ERROR metastore.RetryingHMSHandler: NoSuchObjectException(message:Partition for which stats is gathered doesn't exist.) at org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatistics(ObjectStore.java:5952) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) at com.sun.proxy.$Proxy6.updatePartitionColumnStatistics(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.update_partition_column_statistics(HiveMetaStore.java:4346) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:5678) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10231) Compute partition column stats fails if partition col type is date
Chaoyu Tang created HIVE-10231: -- Summary: Compute partition column stats fails if partition col type is date Key: HIVE-10231 URL: https://issues.apache.org/jira/browse/HIVE-10231 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Fix For: 1.2.0 Currently the command "analyze table .. partition .. compute statistics for columns" may only work for partition column type of string, numeric types, but not others like date. See following case using date as partition coltype: {code} create table colstatspartdate (key int, value string) partitioned by (ds date, hr int); insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select key, value from src limit 20; analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute statistics for columns; {code} you will get RuntimeException: {code} FAILED: RuntimeException Cannot convert to Date from: int 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to Date from: int java.lang.RuntimeException: Cannot convert to Date from: int at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10307) Support to use number literals in partition column
Chaoyu Tang created HIVE-10307: -- Summary: Support to use number literals in partition column Key: HIVE-10307 URL: https://issues.apache.org/jira/browse/HIVE-10307 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Data types like TinyInt, SmallInt, BigInt or Decimal can be expressed as literals with postfix like Y, S, L, or BD appended to the number. These literals work in most Hive queries, but do not when they are used as partition column value. For a partitioned table like: create table partcoltypenum (key int, value string) partitioned by (tint tinyint, sint smallint, bint bigint); insert into partcoltypenum partition (tint=100Y, sint=1S, bint=1000L) select key, value from src limit 30; Queries like select, describe and drop partition do not work. For an example select * from partcoltypenum where tint=100Y and sint=1S and bint=1000L; does not return any rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10313) Literal Decimal ExprNodeConstantDesc should contain value of HiveDecimal instead of String
Chaoyu Tang created HIVE-10313: -- Summary: Literal Decimal ExprNodeConstantDesc should contain value of HiveDecimal instead of String Key: HIVE-10313 URL: https://issues.apache.org/jira/browse/HIVE-10313 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang In TyepCheckProcFactory.NumExprProcessor, the ExprNodeConstantDesc is created from strVal: {code} else if (expr.getText().endsWith("BD")) { // Literal decimal String strVal = expr.getText().substring(0, expr.getText().length() - 2); HiveDecimal hd = HiveDecimal.create(strVal); int prec = 1; int scale = 0; if (hd != null) { prec = hd.precision(); scale = hd.scale(); } DecimalTypeInfo typeInfo = TypeInfoFactory.getDecimalTypeInfo(prec, scale); return new ExprNodeConstantDesc(typeInfo, strVal); } {code} It should use HiveDecmal: return new ExprNodeConstantDesc(typeInfo, hd); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10322) TestJdbcWithMiniHS2.testNewConnectionConfiguration fails
Chaoyu Tang created HIVE-10322: -- Summary: TestJdbcWithMiniHS2.testNewConnectionConfiguration fails Key: HIVE-10322 URL: https://issues.apache.org/jira/browse/HIVE-10322 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Trivial Fix test org.apache.hive.jdbc.TestJdbcWithMiniHS2.testNewConnectionConfiguration failed with following error: {code} org.apache.hive.service.cli.HiveSQLException: Failed to open new session: org.apache.hive.service.cli.HiveSQLException: java.lang.IllegalArgumentException: hive configuration hive.server2.thrift.http.max.worker.threads does not exists. at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:243) at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:234) at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:513) at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:188) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:233) at org.apache.hive.jdbc.TestJdbcWithMiniHS2.testNewConnectionConfiguration(TestJdbcWithMiniHS2.java:275) Caused by: org.apache.hive.service.cli.HiveSQLException: Failed to open new session: org.apache.hive.service.cli.HiveSQLException: java.lang.IllegalArgumentException: hive configuration hive.server2.thrift.http.max.worker.threads does not exists. {code} It seems related to HIVE-10271(remove hive.server2.thrift.http.min/max.worker.threads properties) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10362) Support Type check/conversion in dynamic partition column
Chaoyu Tang created HIVE-10362: -- Summary: Support Type check/conversion in dynamic partition column Key: HIVE-10362 URL: https://issues.apache.org/jira/browse/HIVE-10362 Project: Hive Issue Type: Improvement Components: Query Processor, Types Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang There are quite a lot of issues associated with the non-noramlized or type-mismatched values for partition column. Hive has many ways to introduce such problematic data. HIVE-10307 mainly provides the support to type check/convert/normalize the partition column value in static partition specification. This JIRA tries to deal with the partition column type in dynamic partition insert. Currently any data can be inserted as a partition column value as long as it is quoted as a string. For example, create table dynparttypechecknum (key int, value string) partitioned by (part int); insert into dynparttypechecknum partition (part) select key, value, '1' from src limit 1; show partitions dynparttypechecknum; -- part=1 The partition column value is non-normalized int 1. It causes some unnecessary problems such as integer partition column JDO filter pushdown (see HIVE-6052) and others like HIVE-10210. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10363) Provide a way to normalize the legacy partition column values
Chaoyu Tang created HIVE-10363: -- Summary: Provide a way to normalize the legacy partition column values Key: HIVE-10363 URL: https://issues.apache.org/jira/browse/HIVE-10363 Project: Hive Issue Type: Improvement Components: Types Reporter: Chaoyu Tang Assignee: Chaoyu Tang We have seen a lot issues which were caused by the non-normalized partition column values, such as HIVE-10210, HIVE-6052 etc. Besides type checking, converting and normalizing the partition column values in insert/alter partition operations (see HIVE-10307, HIVE-10362), we need provide an easy way for users to normalize their legacy partition column data. HIVE-5700 has attempted to do at metastore sql level, but given that many flavors of backend databases and different versions, it is quite hard to achieve that and it is also error prone. The sql portion of change in HIVE-5700 has been reverted by HIVE-9445/HIVE-9509. Currently alter table .. partition ... rename could be used to normalize the partition column for each partition, but I am thinking any other more convenient and better way to do that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10541) Beeline requires newline at the end of each query in a file
Chaoyu Tang created HIVE-10541: -- Summary: Beeline requires newline at the end of each query in a file Key: HIVE-10541 URL: https://issues.apache.org/jira/browse/HIVE-10541 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 0.13.1 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Beeline requires newline at the end of each query in a file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10571) HiveMetaStoreClient should close existing thrift connection before its reconnect
Chaoyu Tang created HIVE-10571: -- Summary: HiveMetaStoreClient should close existing thrift connection before its reconnect Key: HIVE-10571 URL: https://issues.apache.org/jira/browse/HIVE-10571 Project: Hive Issue Type: Bug Components: Metastore Reporter: Chaoyu Tang Assignee: Chaoyu Tang HiveMetaStoreClient should first close its existing thrift connection, no matter it is already dead or still live, before its opening another connection in its reconnect() method. Otherwise, it might lead to resource huge accumulation or leak at HMS site when client keeps on retrying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10587) ExprNodeColumnDesc should be created with isPartitionColOrVirtualCol true for DP column
Chaoyu Tang created HIVE-10587: -- Summary: ExprNodeColumnDesc should be created with isPartitionColOrVirtualCol true for DP column Key: HIVE-10587 URL: https://issues.apache.org/jira/browse/HIVE-10587 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor In SymenticAnalyzer method: Operator genConversionSelectOperator(String dest, QB qb, Operator input, TableDesc table_desc, DynamicPartitionCtx dpCtx) throws SemanticException == The DP column's ExprNodeColumnDesc is created by passing false as the parameter isPartitionColOrVirtualCol value: {code} // DP columns starts with tableFields.size() for (int i = tableFields.size() + (updating() ? 1 : 0); i < rowFields.size(); ++i) { TypeInfo rowFieldTypeInfo = rowFields.get(i).getType(); ExprNodeDesc column = new ExprNodeColumnDesc( rowFieldTypeInfo, rowFields.get(i).getInternalName(), "", false); expressions.add(column); } {code} I think it should be true instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()
Chaoyu Tang created HIVE-10620: -- Summary: ZooKeeperHiveLock overrides equal() method but not hashcode() Key: HIVE-10620 URL: https://issues.apache.org/jira/browse/HIVE-10620 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang ZooKeeperHiveLock overrides the public boolean equals(Object o) method but does not for public int hashCode(). It violates the Java contract that equal and may cause unexpected results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10835) Concurrency issues in JDBC driver
Chaoyu Tang created HIVE-10835: -- Summary: Concurrency issues in JDBC driver Key: HIVE-10835 URL: https://issues.apache.org/jira/browse/HIVE-10835 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 1.2.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Though JDBC specification specifies that "Each Connection object can create multiple Statement objects that may be used concurrently by the program", but that does not work in current Hive JDBC driver. In addition, there also exist race conditions between DatabaseMetaData, Statement and ResultSet as long as they make RPC calls to HS2 using same Thrift transport, which happens within a connection. So we need a connection level lock to serialize all these RPC calls in a connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10976) Redundant HiveMetaStore connect check in HS2 CLIService start
Chaoyu Tang created HIVE-10976: -- Summary: Redundant HiveMetaStore connect check in HS2 CLIService start Key: HIVE-10976 URL: https://issues.apache.org/jira/browse/HIVE-10976 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Trivial During HS2 startup, CLIService start() does a HMS connection test to HMS. It is redundant, since in its init stage, CLIService calls applyAuthorizationConfigPolicy where it starts a sessionState and establishes a connection to HMS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10977) No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled
Chaoyu Tang created HIVE-10977: -- Summary: No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled Key: HIVE-10977 URL: https://issues.apache.org/jira/browse/HIVE-10977 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor When hive.metastore.try.direct.sql is set to false, HMS will use JDO to retrieve data, therefor it is not necessary to instantiate an expensive MetaStoreDirectSql during ObjectStore initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11100) Beeline should escape semi-colon in queries
Chaoyu Tang created HIVE-11100: -- Summary: Beeline should escape semi-colon in queries Key: HIVE-11100 URL: https://issues.apache.org/jira/browse/HIVE-11100 Project: Hive Issue Type: Improvement Components: Beeline Affects Versions: 1.2.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Beeline should escape the semicolon in queries. for example, the query like followings: CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n'; or CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;' LINES TERMINATED BY '\n'; both failed. But the 2nd query with semicolon escaped with "\" works in CLI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11157) Hive.get(HiveConf) returns same Hive object to different user sessions
Chaoyu Tang created HIVE-11157: -- Summary: Hive.get(HiveConf) returns same Hive object to different user sessions Key: HIVE-11157 URL: https://issues.apache.org/jira/browse/HIVE-11157 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.3.0, 2.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Current Hive.get(HiveConf) creates and returns a new Hive object when the ThreadLocal Hive is null or HMS config is not compatible, but does not do that when it is called in a thread which has been switched to execute a session with a different userId. It will cause the impersonation issue to HMS. It is related to HIVE-7890. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11666) Discrepency in INSERT OVERWRITE LOCAL DIRECTORY between Beeline and CLI
Chaoyu Tang created HIVE-11666: -- Summary: Discrepency in INSERT OVERWRITE LOCAL DIRECTORY between Beeline and CLI Key: HIVE-11666 URL: https://issues.apache.org/jira/browse/HIVE-11666 Project: Hive Issue Type: Sub-task Components: CLI, HiveServer2 Reporter: Chaoyu Tang Hive CLI writes to local host when INSERT OVERWRITE LOCAL DIRECTORY. But Beeline writes to HS2 local directory. For a user migrating from CLI to Beeline, it might be a big chance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11667) Support Trash and Snapshot in Truncate Table
Chaoyu Tang created HIVE-11667: -- Summary: Support Trash and Snapshot in Truncate Table Key: HIVE-11667 URL: https://issues.apache.org/jira/browse/HIVE-11667 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Currently Truncate Table (or Partition) is implemented using FileSystem.delete and then recreate the directory. It does not support HDFS Trash if it is turned on. The table/partition can not be truncated if it has a snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables
Chaoyu Tang created HIVE-11786: -- Summary: Deprecate the use of redundant column in colunm stats related tables Key: HIVE-11786 URL: https://issues.apache.org/jira/browse/HIVE-11786 Project: Hive Issue Type: Bug Components: Metastore Reporter: Chaoyu Tang Assignee: Chaoyu Tang The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. These redundant columns violate database normalization rules and cause a lot of inconvenience (sometimes difficult) in column stats related feature implementation. For example, when renaming a table, we have to update TABLE_NAME column in these tables as well which is unnecessary. This JIRA is first to deprecate the use of these columns at HMS code level. A followed JIRA is to be opened to focus on DB schema change and upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11787) Remove the redundant columns in TAB_COL_STATS and PART_COL_STATS
Chaoyu Tang created HIVE-11787: -- Summary: Remove the redundant columns in TAB_COL_STATS and PART_COL_STATS Key: HIVE-11787 URL: https://issues.apache.org/jira/browse/HIVE-11787 Project: Hive Issue Type: Bug Components: Metastore Reporter: Chaoyu Tang Assignee: Chaoyu Tang After HIVE-11786 deprecates the use of redundant columns in TAB_COL_STATS and PART_COL_STATS at HMS code level, the column DB_NAME/TABLE_NAME in TAB_COL_STATS and DB_NAME/TABLE_NAME/PARTITION_NAME in PART_COL_STATS are useless and should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11788) Column stats should be preserved after db/table/partitoin rename
Chaoyu Tang created HIVE-11788: -- Summary: Column stats should be preserved after db/table/partitoin rename Key: HIVE-11788 URL: https://issues.apache.org/jira/browse/HIVE-11788 Project: Hive Issue Type: Bug Components: Metastore, Statistics Reporter: Chaoyu Tang Assignee: Chaoyu Tang Currently we simply delete the column stats after renaming a database, table, or partition since there was not an easy way in HMS to update the DB_NAME, TABLE_NAME and PARTITION_NAME in TAB_COL_STATS and PART_COL_STATS. With the removal of these redundant columns in these tables (HIVE-11786), we will still keep column stats in the operation which is not to change a column name or type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11926) NPE could occur in collectStatistics when column type is varchar
Chaoyu Tang created HIVE-11926: -- Summary: NPE could occur in collectStatistics when column type is varchar Key: HIVE-11926 URL: https://issues.apache.org/jira/browse/HIVE-11926 Project: Hive Issue Type: Bug Components: Logical Optimizer, Statistics Affects Versions: 1.2.1 Reporter: Chaoyu Tang Assignee: Chaoyu Tang If column stats is calculated and populated to HMS from its client like Impala etc, the column type name stored in TAB_COL_STATS/PART_COL_STATS could be in uppercase (e.g. VARCHAR, DECIMAL). When Hive collects stats for these columns during optimization (with hive.stats.fetch.column.stats set to true), it will throw out NPE. See error message like below: {code} Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: NullPointerException null at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:103) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:172) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:379) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:366) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException: null at org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:636) at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:623) at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:180) at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:136) at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:124) truncated {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11941) Update committer list
Chaoyu Tang created HIVE-11941: -- Summary: Update committer list Key: HIVE-11941 URL: https://issues.apache.org/jira/browse/HIVE-11941 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Priority: Minor Please update the committer list in http://hive.apache.org/people.html: --- Name: Chaoyu Tang Apache ID: ctang Organization: Cloudera (www.cloudera.com) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11964) RelOptHiveTable.hiveColStatsMap might contain mismatched column stats
Chaoyu Tang created HIVE-11964: -- Summary: RelOptHiveTable.hiveColStatsMap might contain mismatched column stats Key: HIVE-11964 URL: https://issues.apache.org/jira/browse/HIVE-11964 Project: Hive Issue Type: Bug Components: Query Planning, Statistics Affects Versions: 1.2.1 Reporter: Chaoyu Tang Assignee: Chaoyu Tang RelOptHiveTable.hiveColStatsMap might contain mismatched stats since it was built by assuming the stats returned from == hiveColStats =StatsUtils.getTableColumnStats(hiveTblMetadata, hiveNonPartitionCols, nonPartColNamesThatRqrStats); or HiveMetaStoreClient.getTableColumnStatistics(dbName, tableName, colNames) == have the same order of the requested columns. But actually the order is non-deterministic. therefore the returned stats should be re-ordered before it is put in hiveColStatsMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11995) Remove repetitively setting permissions in insert/load overwrite partition
Chaoyu Tang created HIVE-11995: -- Summary: Remove repetitively setting permissions in insert/load overwrite partition Key: HIVE-11995 URL: https://issues.apache.org/jira/browse/HIVE-11995 Project: Hive Issue Type: Bug Components: Security Reporter: Chaoyu Tang Assignee: Chaoyu Tang When hive.warehouse.subdir.inherit.perms is set to true, insert/load overwrite .. partition set table and partition permissions repetitively which is not necessary and causing performance issue especially in the cases where there are multiple levels of partitions involved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12053) Stats performance regression caused by HIVE-11786
Chaoyu Tang created HIVE-12053: -- Summary: Stats performance regression caused by HIVE-11786 Key: HIVE-12053 URL: https://issues.apache.org/jira/browse/HIVE-12053 Project: Hive Issue Type: Bug Components: Metastore Reporter: Chaoyu Tang HIVE-11786 tried to normalize table TAB_COL_STATS/PART_COL_STATS but caused performance regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12188) DoAs does not work properly in non-kerberos secured HS2
Chaoyu Tang created HIVE-12188: -- Summary: DoAs does not work properly in non-kerberos secured HS2 Key: HIVE-12188 URL: https://issues.apache.org/jira/browse/HIVE-12188 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang The case with following settings is valid but it seems still not work correctly in current HS2 == hive.server2.authentication=NONE (or LDAP) hive.server2.enable.doAs= true hive.metastore.sasl.enabled=true (with HMS Kerberos enabled) == Currently HS2 is able to fetch the delegation token to a kerberos secured HMS only when itself is also kerberos secured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12218) Unable to create a like table for an hbase backed table
Chaoyu Tang created HIVE-12218: -- Summary: Unable to create a like table for an hbase backed table Key: HIVE-12218 URL: https://issues.apache.org/jira/browse/HIVE-12218 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.1 Reporter: Chaoyu Tang Assignee: Chaoyu Tang For an HBase backed table: {code} CREATE TABLE hbasetbl (key string, state string, country string, country_id int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = "info:state,info:country,info:country_id" ); {code} Create its like table using query such as create table hbasetbl_like like hbasetbl; It fails with error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: must specify an InputFormat class -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12245) Support column comments for an HBase backed table
Chaoyu Tang created HIVE-12245: -- Summary: Support column comments for an HBase backed table Key: HIVE-12245 URL: https://issues.apache.org/jira/browse/HIVE-12245 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Currently the column comments of an HBase backed table are always returned as "from deserializer". For example, {code} CREATE TABLE hbasetbl (key string comment 'It is key', state string comment 'It is state', country string comment 'It is country', country_id int comment 'It is country_id') STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = "info:state,info:country,info:country_id" ); hive> describe hbasetbl; key string from deserializer state string from deserializer country string from deserializer country_id int from deserializer {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12248) The rawStore used in DBTokenStore should be thread-safe
Chaoyu Tang created HIVE-12248: -- Summary: The rawStore used in DBTokenStore should be thread-safe Key: HIVE-12248 URL: https://issues.apache.org/jira/browse/HIVE-12248 Project: Hive Issue Type: Bug Components: Authentication Reporter: Chaoyu Tang Assignee: Chaoyu Tang A non-thread-safe implementation of RawStore, particularly ObjectStore, set in DBTokenStore is being shared by multi-threads, which causes the race condition in DataNuclues to access the backend DB. The DN PersistenceManager(PM) in ObjectStore is not thread safe, so DBTokenStore should use a ThreadLocal ObjectStore. Following errors might be root caused by the race condition in DN PM. {code} Object of type "org.apache.hadoop.hive.metastore.model.MDelegationToken" is detached. Detached objects cannot be used with this operation. org.datanucleus.exceptions.ObjectDetachedException: Object of type "org.apache.hadoop.hive.metastore.model.MDelegationToken" is detached. Detached objects cannot be used with this operation. at org.datanucleus.ExecutionContextImpl.assertNotDetached(ExecutionContextImpl.java:5728) at org.datanucleus.ExecutionContextImpl.retrieveObject(ExecutionContextImpl.java:1859) at org.datanucleus.ExecutionContextThreadedImpl.retrieveObject(ExecutionContextThreadedImpl.java:203) at org.datanucleus.api.jdo.JDOPersistenceManager.jdoRetrieve(JDOPersistenceManager.java:605) at org.datanucleus.api.jdo.JDOPersistenceManager.retrieveAll(JDOPersistenceManager.java:693) at org.datanucleus.api.jdo.JDOPersistenceManager.retrieveAll(JDOPersistenceManager.java:713) at org.apache.hadoop.hive.metastore.ObjectStore.getAllTokenIdentifiers(ObjectStore.java:6517) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12259) Command containing semicolon is broken in Beeline
Chaoyu Tang created HIVE-12259: -- Summary: Command containing semicolon is broken in Beeline Key: HIVE-12259 URL: https://issues.apache.org/jira/browse/HIVE-12259 Project: Hive Issue Type: Bug Components: Beeline Reporter: Chaoyu Tang Assignee: Chaoyu Tang The Beeline command (!cmd) containing semicolon is broken. For example: !connect jdbc:hive2://localhost:10001/default;principal=hive/xyz@realm.com is broken because the included ";" makes it not to run with execCommandWithPrefix as a whole command. {code} if (line.startsWith(COMMAND_PREFIX) && !line.contains(";")) { // handle the case "!cmd" for beeline return execCommandWithPrefix(line); } else { return commands.sql(line, getOpts().getEntireLineAsCommand()); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12270) Add DBTokenStore support to HS2 delegation token
Chaoyu Tang created HIVE-12270: -- Summary: Add DBTokenStore support to HS2 delegation token Key: HIVE-12270 URL: https://issues.apache.org/jira/browse/HIVE-12270 Project: Hive Issue Type: New Feature Reporter: Chaoyu Tang Assignee: Chaoyu Tang DBTokenStore was initially introduced by HIVE-3255 in Hive-0.12 and it is mainly for HMS delegation token. Later in Hive-0.13, the HS2 delegation token support was introduced by HIVE-5155 but it used MemoryTokenStore as token store. That the HIVE-9622 uses the shared RawStore (or HMSHandler) to access the token/keys information in HMS DB directly from HS2 seems not the right approach to support DBTokenStore in HS2. I think we should use HiveMetaStoreClient in HS2 instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12306) hbase_queries.q fails in Hive 1.3.0
Chaoyu Tang created HIVE-12306: -- Summary: hbase_queries.q fails in Hive 1.3.0 Key: HIVE-12306 URL: https://issues.apache.org/jira/browse/HIVE-12306 Project: Hive Issue Type: Bug Affects Versions: 1.3.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Trivial hbase_queries.q is failing (only in version 1.3.0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12346) Internally used variables in HiveConf should not be settable via command
Chaoyu Tang created HIVE-12346: -- Summary: Internally used variables in HiveConf should not be settable via command Key: HIVE-12346 URL: https://issues.apache.org/jira/browse/HIVE-12346 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 1.2.1 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Some HiveConf variables such as hive.added.jars.path are only for internal use and should not be settable via set command. We saw a lot of cases that users mistakenly set these variables using set command despite some of them have been documented as "internal parameter" in Hive. The command usually succeeds but it sometimes does not effect, which causes some confusions. For example, the hive.added.jars.path can be set via set command but it is sometimes overridden by session resource jars during runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12365) Added resource path is sent to cluster as an empty string when externally removed
Chaoyu Tang created HIVE-12365: -- Summary: Added resource path is sent to cluster as an empty string when externally removed Key: HIVE-12365 URL: https://issues.apache.org/jira/browse/HIVE-12365 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Sometimes the resources (e.g. jar) added via command like "add jars " are removed externally from their filepath for some reasons. Their paths are sent to cluster as empty strings which causes the failures to the query that even do not need these jars in execution. The error look like as following: {code} 15/11/06 21:56:44 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/tmp/hadoop-ctang/mapred/staging/ctang734817191/.staging/job_local734817191_0003 java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.(Path.java:135) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:215) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12505) Insert overwrite in same encrypted zone silently fails to remove some existing files
Chaoyu Tang created HIVE-12505: -- Summary: Insert overwrite in same encrypted zone silently fails to remove some existing files Key: HIVE-12505 URL: https://issues.apache.org/jira/browse/HIVE-12505 Project: Hive Issue Type: Bug Components: Encryption Affects Versions: 1.2.1 Reporter: Chaoyu Tang Assignee: Chaoyu Tang With HDFS Trash enabled but its encryption zone lower than Hive data directory, insert overwrite command silently fails to trash the existing files during overwrite, which could lead to unexpected incorrect results (more rows returned than expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12566) Incorrect result returns when using COALESCE in WHERE condition with LEFT JOIN
Chaoyu Tang created HIVE-12566: -- Summary: Incorrect result returns when using COALESCE in WHERE condition with LEFT JOIN Key: HIVE-12566 URL: https://issues.apache.org/jira/browse/HIVE-12566 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 0.13.0 Reporter: Chaoyu Tang Priority: Critical The left join query with on/where clause returns incorrect result (more rows are returned). See the reproducible sample below. Left table with data: {code} CREATE TABLE ltable (i int, la int, lk1 string, lk2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; --- 1,\N,CD5415192314304,00071 2,\N,CD5415192225530,00071 {code} Right table with data: {code} CREATE TABLE rtable (ra int, rk1 string, rk2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; --- 1,CD5415192314304,00071 45,CD5415192314304,00072 {code} Query: {code} SELECT * FROM ltable l LEFT OUTER JOIN rtable r on (l.lk1 = r.rk1 AND l.lk2 = r.rk2) WHERE COALESCE(l.la,'EMPTY')=COALESCE(r.ra,'EMPTY'); {code} Result returns: {code} 1 NULLCD5415192314304 00071 NULLNULLNULL 2 NULLCD5415192225530 00071 NULLNULLNULL {code} The correct result should be {code} 2 NULLCD5415192225530 00071 NULLNULLNULL {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12607) Hive fails on zero length sequence files
Chaoyu Tang created HIVE-12607: -- Summary: Hive fails on zero length sequence files Key: HIVE-12607 URL: https://issues.apache.org/jira/browse/HIVE-12607 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.1 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Flume will, at times, generate zero length sequence files which cause the Hive query failure. To reproduce the issue: {code} > create external table test (id string) partitioned by (system string,date string) STORED AS SEQUENCEFILE LOCATION '/user/me/test'; hadoop fs -mkdir /user/me/test/logs hadoop fs -mkdir /user/me/test/logs/date=2014 hadoop fs -touchz /user/me/test/logs/date=2014/a.txt hive -> ALTER TABLE test ADD PARTITION (system = 'logs',date='2014') location '/user/me/test/logs/date=2014'; -> select * from test t1,test t2 where t1.id = t2.id; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12713) Miscellaneous improvements in driver compile and execute logging
Chaoyu Tang created HIVE-12713: -- Summary: Miscellaneous improvements in driver compile and execute logging Key: HIVE-12713 URL: https://issues.apache.org/jira/browse/HIVE-12713 Project: Hive Issue Type: Improvement Components: Logging Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Miscellaneous compile and execute logging improvements include: 1. ensuring that only the redacted query to be logged out 2. removing redundant variable substitution in HS2 SQLOperation 3. logging out the query and its compilation time without having to enable PerfLogger debug, to help identify badly written queries which take a lot of time to compile and probably cause other good queries to be queued (HIVE-12516) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12812) Enable mapred.input.dir.recursive by default to support union with aggregate function
Chaoyu Tang created HIVE-12812: -- Summary: Enable mapred.input.dir.recursive by default to support union with aggregate function Key: HIVE-12812 URL: https://issues.apache.org/jira/browse/HIVE-12812 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1, 2.1.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang When union remove optimization is enabled, union query with aggregate function writes its subquery intermediate results to subdirs which needs mapred.input.dir.recursive to be enabled in order to be fetched. This property is not defined by default in Hive and often ignored by user, which causes the query failure and is hard to be debugged. So we need set mapred.input.dir.recursive to true whenever union remove optimization is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12840) stats optimize subqueries whenever possible in a union query
Chaoyu Tang created HIVE-12840: -- Summary: stats optimize subqueries whenever possible in a union query Key: HIVE-12840 URL: https://issues.apache.org/jira/browse/HIVE-12840 Project: Hive Issue Type: Improvement Components: Logical Optimizer Reporter: Chaoyu Tang HIVE-12788 addressed a data incorrect issue in union query with aggregate function when stats optimization is enabled. It won't stats optimize a query if any of its subqueries can not be. [~pxiong] suggested an enhancement to leverage the stats optimizer whenever possible (even only for a branch of a union), and we need investigate the possible solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12901) Incorrect result from select '\\' like '\\%'
Chaoyu Tang created HIVE-12901: -- Summary: Incorrect result from select '\\' like '\\%' Key: HIVE-12901 URL: https://issues.apache.org/jira/browse/HIVE-12901 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.1 Reporter: Chaoyu Tang The query returns false. Actually MySQL also returns 0 which I do not think it right. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12965) Insert overwrite local directory should perserve the overwritten directory permission
Chaoyu Tang created HIVE-12965: -- Summary: Insert overwrite local directory should perserve the overwritten directory permission Key: HIVE-12965 URL: https://issues.apache.org/jira/browse/HIVE-12965 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang In Hive, "insert overwrite local directory" first deletes the overwritten directory if exists, recreate a new one, then copy the files from src directory to the new local directory. This process sometimes changes the permissions of the to-be-overwritten local directory, therefore causing some applications no more to be able to access its content. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13082) Enable constant propagation optimization in query with left semi join
Chaoyu Tang created HIVE-13082: -- Summary: Enable constant propagation optimization in query with left semi join Key: HIVE-13082 URL: https://issues.apache.org/jira/browse/HIVE-13082 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 2.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Currently constant folding is only allowed for inner or unique join, I think it is also applicable and allowed for left semi join. Otherwise the query like following having multiple joins with left semi joins will fail: {code} select table1.id, table1.val, table2.val2 from table1 inner join table2 on table1.val = 't1val01' and table1.id = table2.id left semi join table3 on table1.dimid = table3.id; {code} with errors: {code} java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.6.0.jar:?] at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) [hadoop-mapreduce-client-common-2.6.0.jar:?] Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) ~[hadoop-common-2.6.0.jar:?] at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) ~[hadoop-common-2.6.0.jar:?] at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) ~[hadoop-common-2.6.0.jar:?] at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446) ~[hadoop-mapreduce-client-core-2.6.0.jar:?] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) ~[hadoop-mapreduce-client-core-2.6.0.jar:?] at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) ~[hadoop-mapreduce-client-common-2.6.0.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[?:1.7.0_45] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_45] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[?:1.7.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[?:1.7.0_45] at java.lang.Thread.run(Thread.java:744) ~[?:1.7.0_45] ... Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[?:1.7.0_45] at java.util.ArrayList.get(ArrayList.java:411) ~[?:1.7.0_45] at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.(StandardStructObjectInspector.java:109) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:326) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:311) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:181) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:319) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:78) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.MapJoinOperator.initializeOp(MapJoinOperator.java:138) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:355) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:504) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13164) Predicate pushdown may cause cross-product in left semi join
Chaoyu Tang created HIVE-13164: -- Summary: Predicate pushdown may cause cross-product in left semi join Key: HIVE-13164 URL: https://issues.apache.org/jira/browse/HIVE-13164 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Chaoyu Tang Assignee: Chaoyu Tang For some left semi join queries like followings: select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t2.value = 'val_0'; or select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t1.value = 'val_0'; Their plans show that they have been converted to keyless cross-product due to the predicate pushdown and the dropping of the on condition. {code} LOGICAL PLAN: t1:t1 TableScan (TS_0) alias: t1 Statistics: Num rows: 1453 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator (FIL_18) predicate: (key = 0) (type: boolean) Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE Select Operator (SEL_2) Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator (RS_9) sort order: Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE Join Operator (JOIN_11) condition map: Left Semi Join 0 to 1 keys: 0 1 Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE Group By Operator (GBY_13) aggregations: count(1) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator (RS_14) sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Group By Operator (GBY_15) aggregations: count(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator (FS_17) compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe t2:t2 TableScan (TS_3) alias: t2 Statistics: Num rows: 645 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator (FIL_19) predicate: ((key = 0) and (value = 'val_0')) (type: boolean) Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE Select Operator (SEL_5) Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE Group By Operator (GBY_8) keys: 'val_0' (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator (RS_10) sort order: Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE Join Operator (JOIN_11) condition map: Left Semi Join 0 to 1 keys: 0 1 Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE {code} [~gopalv], do you think these plans are valid or not? Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13243) Hive drop table on encyption zone fails for external tables
Chaoyu Tang created HIVE-13243: -- Summary: Hive drop table on encyption zone fails for external tables Key: HIVE-13243 URL: https://issues.apache.org/jira/browse/HIVE-13243 Project: Hive Issue Type: Bug Components: Encryption, Metastore Reporter: Chaoyu Tang Assignee: Chaoyu Tang When dropping an external table with its data located in an encryption zone, hive should not throw out MetaException(message:Unable to drop table because it is in an encryption zone and trash is enabled. Use PURGE option to skip trash.) in checkTrashPurgeCombination since the data should not get deleted (or trashed) anyway regardless HDFS Trash is enabled or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13294) AvroSerde leaks the connection in a case when reading schema from a url
Chaoyu Tang created HIVE-13294: -- Summary: AvroSerde leaks the connection in a case when reading schema from a url Key: HIVE-13294 URL: https://issues.apache.org/jira/browse/HIVE-13294 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Chaoyu Tang Assignee: Chaoyu Tang AvroSerde leaks the connection in a case when reading schema from url: In public static Schema determineSchemaOrThrowException { ... return AvroSerdeUtils.getSchemaFor(new URL(schemaString).openStream()); ... } The opened inputStream is never closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13401) Kerberized HS2 with LDAP auth enabled fails the delegation token authentication
Chaoyu Tang created HIVE-13401: -- Summary: Kerberized HS2 with LDAP auth enabled fails the delegation token authentication Key: HIVE-13401 URL: https://issues.apache.org/jira/browse/HIVE-13401 Project: Hive Issue Type: Bug Components: Authentication Reporter: Chaoyu Tang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13509) HCatalog getSplits should ignore the partition with invalid path
Chaoyu Tang created HIVE-13509: -- Summary: HCatalog getSplits should ignore the partition with invalid path Key: HIVE-13509 URL: https://issues.apache.org/jira/browse/HIVE-13509 Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Chaoyu Tang Assignee: Chaoyu Tang It is quite common that there is the discrepancy between partition directory and its HMS metadata, simply because the directory could be added/deleted externally using hdfs shell command. Technically it should be fixed by MSCK and alter table .. add/drop command etc, but sometimes it might not be practical especially in a multi-tenant env. This discrepancy does not cause any problem to Hive, Hive returns no rows for a partition with an invalid (e.g. non-existing) path, but it fails the Pig load with HCatLoader, because the HCatBaseInputFormat getSplits throws an error when getting a split for a non-existing path. The error message might looks like: {code} Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://xyz.com:8020/user/hive/warehouse/xyz/date=2016-01-01/country=BR at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13588) NPE is thrown from MapredLocalTask.executeInChildVM
Chaoyu Tang created HIVE-13588: -- Summary: NPE is thrown from MapredLocalTask.executeInChildVM Key: HIVE-13588 URL: https://issues.apache.org/jira/browse/HIVE-13588 Project: Hive Issue Type: Bug Components: Logging Reporter: Chaoyu Tang Assignee: Chaoyu Tang NPE was thrown out from MapredLocalTask.executeInChildVM in running some queries with CLI, see error below: {code} java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInChildVM(MapredLocalTask.java:321) [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.execute(MapredLocalTask.java:148) [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:172) [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1868) [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1595) [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1346) [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1117) [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1105) [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236) [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:782) [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721) [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648) [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_45] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_45] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_45] {code} It is because the operationLog is only applicable to HS2 but CLI, therefore it might not be set (null) It is related to HIVE-13183 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13590) Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case
Chaoyu Tang created HIVE-13590: -- Summary: Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case Key: HIVE-13590 URL: https://issues.apache.org/jira/browse/HIVE-13590 Project: Hive Issue Type: Bug Components: Authentication, Security Reporter: Chaoyu Tang Assignee: Chaoyu Tang In a kerberized HS2 with LDAP authentication enabled, LDAP user usually logs in using username in form of username@domain in LDAP multi-domain case. But it fails if the domain was not in the Hadoop auth_to_local mapping rule, the error is as following: {code} Caused by: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to ct...@mydomain.com at org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389) at org.apache.hadoop.security.User.(User.java:48) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13748) TypeInfoParser cannot handle the dash in the field name of a complex type
Chaoyu Tang created HIVE-13748: -- Summary: TypeInfoParser cannot handle the dash in the field name of a complex type Key: HIVE-13748 URL: https://issues.apache.org/jira/browse/HIVE-13748 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor hive> create table y(col struct<`a-b`:double> COMMENT 'type field has a dash'); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.IllegalArgumentException: Error: : expected at the position 8 of 'struct' but '-' is found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13953) Issues in HiveLockObject equals method
Chaoyu Tang created HIVE-13953: -- Summary: Issues in HiveLockObject equals method Key: HIVE-13953 URL: https://issues.apache.org/jira/browse/HIVE-13953 Project: Hive Issue Type: Bug Components: Locking Reporter: Chaoyu Tang Assignee: Chaoyu Tang There are two issues in equals method in HiveLockObject: {code} @Override public boolean equals(Object o) { if (!(o instanceof HiveLockObject)) { return false; } HiveLockObject tgt = (HiveLockObject) o; return Arrays.equals(pathNames, tgt.pathNames) && data == null ? tgt.getData() == null : tgt.getData() != null && data.equals(tgt.getData()); } {code} 1. Arrays.equals(pathNames, tgt.pathNames) might return false for the same path in HiveLockObject since in current Hive, the pathname components might be stored in two ways, taking a dynamic partition path db/tbl/part1/part2 as an example, it might be stored in the pathNames as an array of four elements, db, tbl, part1, and part2 or as an array only having one element db/tbl/part1/part2. It will be safer to comparing the pathNames using StringUtils.equals(this.getName(), tgt.getName()) 2. The comparison logic is not right. {code} @Override public boolean equals(Object o) { if (!(o instanceof HiveLockObject)) { return false; } HiveLockObject tgt = (HiveLockObject) o; return StringUtils.equals(this.getName(), tgt.getName()) && (data == null ? tgt.getData() == null : data.equals(tgt.getData())); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13959) MoveTask should only release its query associated locks
Chaoyu Tang created HIVE-13959: -- Summary: MoveTask should only release its query associated locks Key: HIVE-13959 URL: https://issues.apache.org/jira/browse/HIVE-13959 Project: Hive Issue Type: Bug Components: Locking Reporter: Chaoyu Tang Assignee: Chaoyu Tang releaseLocks in MoveTask releases all locks under a HiveLockObject pathNames. But some of locks under this pathNames might be for other queries and should not be released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13975) Hive import table fails if there is no write access to the source location
Chaoyu Tang created HIVE-13975: -- Summary: Hive import table fails if there is no write access to the source location Key: HIVE-13975 URL: https://issues.apache.org/jira/browse/HIVE-13975 Project: Hive Issue Type: Bug Components: Import/Export Reporter: Chaoyu Tang Assignee: Chaoyu Tang It seems not right that a write permission is needed on the source side for import table because the CopyTask in import needs to create a staging directory under the imported source directory. For a user who does not have the write permission to the source directory, you will get error like following {code} Caused by: java.lang.RuntimeException: Cannot create staging directory 'hdfs://quickstart.cloudera:8020/user/hive/exp_t1/.hive-staging_hive_2016-05-26_16-38-29_453_8739265934924968327-1': Permission denied: user=test1, access=WRITE, inode="/user/hive/exp_t1":anonymous:supergroup:drwxrwxr-x ... org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:952) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:945) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1856) at org.apache.hadoop.hive.common.FileUtils.mkdir(FileUtils.java:518) at org.apache.hadoop.hive.ql.Context.getStagingDir(Context.java:234) ... 23 more {code} There are three tasks involved in import table, CopyTask, DDLTask and MoveTask. I wonder if the CopyTask is really needed? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14161) from_utc_timestamp()/to_utc_timestamp return incorrect results with EST
Chaoyu Tang created HIVE-14161: -- Summary: from_utc_timestamp()/to_utc_timestamp return incorrect results with EST Key: HIVE-14161 URL: https://issues.apache.org/jira/browse/HIVE-14161 Project: Hive Issue Type: Bug Components: UDF Reporter: Chaoyu Tang Assignee: Chaoyu Tang {code} hive> SELECT to_utc_timestamp('2016-06-30 06:00:00', 'PST'); OK 2016-06-30 13:00:00 ==>Correct, UTC is 7 hours ahead of PST Time taken: 1.674 seconds, Fetched: 1 row(s) hive> SELECT to_utc_timestamp('2016-06-30 08:00:00', 'CST'); OK 2016-06-30 13:00:00 ==>Correct, UTC is 5 hours ahead of CST Time taken: 1.776 seconds, Fetched: 1 row(s) hive> SELECT to_utc_timestamp('2016-06-30 09:00:00', 'EST'); OK 2016-06-30 14:00:00 ==>Wrong, UTC should be 4 hours ahead of EST Time taken: 1.686 seconds, Fetched: 1 row(s) hive> select from_utc_timestamp('2016-06-30 14:00:00', 'EST'); OK 2016-06-30 09:00:00 ==>Wrong, UTC should be 4 hours ahead of EST {code} It might be something related to daylight savings time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14173) NPE was thrown after enabling directsql in the middle of session
Chaoyu Tang created HIVE-14173: -- Summary: NPE was thrown after enabling directsql in the middle of session Key: HIVE-14173 URL: https://issues.apache.org/jira/browse/HIVE-14173 Project: Hive Issue Type: Bug Components: Metastore Reporter: Chaoyu Tang Assignee: Chaoyu Tang hive.metastore.try.direct.sql is initially set to false in HMS hive-site.xml, then changed to true using set metaconf command in the middle of a session, running a query will be thrown NPE with error message is as following: {code} 2016-07-06T17:44:41,489 ERROR [pool-5-thread-2]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(192)) - MetaException(message:java.lang.NullPointerException) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5741) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rethrowException(HiveMetaStore.java:4771) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:4754) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) at com.sun.proxy.$Proxy18.get_partitions_by_expr(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:12048) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:12032) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.(ObjectStore.java:2667) at org.apache.hadoop.hive.metastore.ObjectStore$GetListHelper.(ObjectStore.java:2825) at org.apache.hadoop.hive.metastore.ObjectStore$4.(ObjectStore.java:2410) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:2410) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:2400) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101) at com.sun.proxy.$Proxy17.getPartitionsByExpr(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:4749) ... 20 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14281) Issue in decimal multiplication
Chaoyu Tang created HIVE-14281: -- Summary: Issue in decimal multiplication Key: HIVE-14281 URL: https://issues.apache.org/jira/browse/HIVE-14281 Project: Hive Issue Type: Bug Components: Types Reporter: Chaoyu Tang Assignee: Chaoyu Tang {code} CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18)); INSERT OVERWRITE TABLE test VALUES (20, 20); SELECT a*b from test {code} The returned result is NULL (instead of 400) It is because Hive adds the scales from operands and the type for a*b is set to decimal (38, 36). Hive could not handle this case properly (e.g. by rounding) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14298) NPE could be thrown in HMS when an ExpressionTree could not be made from a filter
Chaoyu Tang created HIVE-14298: -- Summary: NPE could be thrown in HMS when an ExpressionTree could not be made from a filter Key: HIVE-14298 URL: https://issues.apache.org/jira/browse/HIVE-14298 Project: Hive Issue Type: Bug Components: Metastore Reporter: Chaoyu Tang Assignee: Chaoyu Tang In many cases where an ExpressionTree could not be made from a filter (e.g. parser fails to parse a filter etc.) and its value is null. But this null is passed around and used by a couple of HMS methods which can cause NullPointerException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14347) Inconsistent hehavior in decimal multiplication
Chaoyu Tang created HIVE-14347: -- Summary: Inconsistent hehavior in decimal multiplication Key: HIVE-14347 URL: https://issues.apache.org/jira/browse/HIVE-14347 Project: Hive Issue Type: Bug Components: Types Reporter: Chaoyu Tang Assignee: Chaoyu Tang 1. select cast('20' as decimal(38,18)) * cast('10' as decimal(38,18)) from test; returns 200, but the type of multiplication result is decimal (38,36) as shown in the query plan. 2. select a*b from atable where column a and b with both column type of decimal (38,18) and column value 20 and 10 respectively, we get result NULL but type decimal (38, 36). -- If we strictly follow current precision/scale manipulations for the decimal multiplication in Hive, the result in case1 400 has already exceeded the data range that decimal (38, 36) supports and it should return null. Current Hive deduces the precision/scale from constant values (10 and 20) and use them (2, 0) instead of the specified precision/scale (38, 18) in the multiplication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14359) Spark fails might fail in LDAP authentication in kerberized cluster
Chaoyu Tang created HIVE-14359: -- Summary: Spark fails might fail in LDAP authentication in kerberized cluster Key: HIVE-14359 URL: https://issues.apache.org/jira/browse/HIVE-14359 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang When HS2 is used as a gateway for the LDAP users to access and run the queries in kerborized cluster, it's authentication mode is configured as LDAP and at this time, HoS might fail by the same reason as HIVE-10594. hive.server2.authentication is not a proper property to determine if a cluster is kerberized, instead hadoop.security.authentication should be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14395) Add the missing data files to Avro union tests (HIVE-14205 addendum)
Chaoyu Tang created HIVE-14395: -- Summary: Add the missing data files to Avro union tests (HIVE-14205 addendum) Key: HIVE-14395 URL: https://issues.apache.org/jira/browse/HIVE-14395 Project: Hive Issue Type: Bug Components: Test Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Trivial The union_non_nullable.txt & union_nullable.txt were not checked in for HIVE-14205. It was my mistake. It is the reason that testCliDriver_avro_nullable_union & testNegativeCliDriver_avro_non_nullable_union are failing in current pre-commit build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14457) Partitions in encryption zone are still trashed though an exception is returned
Chaoyu Tang created HIVE-14457: -- Summary: Partitions in encryption zone are still trashed though an exception is returned Key: HIVE-14457 URL: https://issues.apache.org/jira/browse/HIVE-14457 Project: Hive Issue Type: Bug Components: Encryption, Metastore Reporter: Chaoyu Tang Assignee: Chaoyu Tang drop_partition_common in HiveMetaStore still drops partitions in encryption zone without PURGE even through it returns an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14615) Temp table leaves behind insert command
Chaoyu Tang created HIVE-14615: -- Summary: Temp table leaves behind insert command Key: HIVE-14615 URL: https://issues.apache.org/jira/browse/HIVE-14615 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Chaoyu Tang Assignee: Chaoyu Tang {code} create table test (key int, value string); insert into test values (1, 'val1'); show tables; test values__tmp__table__1 {code} the temp table values__tmp__table__1 was resulted from insert into ...values and exists until logout the session. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14626) Support Trash in Truncate Table
Chaoyu Tang created HIVE-14626: -- Summary: Support Trash in Truncate Table Key: HIVE-14626 URL: https://issues.apache.org/jira/browse/HIVE-14626 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Currently Truncate Table (or Partition) is implemented using FileSystem.delete and then recreate the directory, so 1. it does not support HDFS Trash 2. if the table/partition directory is initially encryption protected, after being deleted and recreated, it is no more protected. The new implementation is to clean the contents of directory using multi-threaded trashFiles. If Trash is enabled and has a lower encryption level than the data directory, the files under it will be deleted. Otherwise, they will be Trashed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14697) Can not access kerberized HS2 Web UI
Chaoyu Tang created HIVE-14697: -- Summary: Can not access kerberized HS2 Web UI Key: HIVE-14697 URL: https://issues.apache.org/jira/browse/HIVE-14697 Project: Hive Issue Type: Bug Components: Web UI Affects Versions: 2.1.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Failed to access kerberized HS2 WebUI with following error msg: {code} curl -v -u : --negotiate http://util185.phx2.cbsig.net:10002/ > GET / HTTP/1.1 > Host: util185.phx2.cbsig.net:10002 > Authorization: Negotiate YIIU7...[redacted]... > User-Agent: curl/7.42.1 > Accept: */* > < HTTP/1.1 413 FULL head < Content-Length: 0 < Connection: close < Server: Jetty(7.6.0.v20120127) {code} It is because the Jetty default request header (4K) is too small in some kerberos case. So this patch is to increase the request header to 64K. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14774) Canceling query using Ctrl-C in beeline might lead to stale locks
Chaoyu Tang created HIVE-14774: -- Summary: Canceling query using Ctrl-C in beeline might lead to stale locks Key: HIVE-14774 URL: https://issues.apache.org/jira/browse/HIVE-14774 Project: Hive Issue Type: Bug Components: Locking Reporter: Chaoyu Tang Assignee: Chaoyu Tang Terminating a running query using Ctrl-C in Beeline might lead to stale locks since the process running the query might still be able to acquire the locks but fail to release them after the query terminate abnormally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14799) Query operation are not thread safe during its cancellation
Chaoyu Tang created HIVE-14799: -- Summary: Query operation are not thread safe during its cancellation Key: HIVE-14799 URL: https://issues.apache.org/jira/browse/HIVE-14799 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Chaoyu Tang Assignee: Chaoyu Tang When a query is cancelled either via Beeline (Ctrl-C) or API call TCLIService.Client.CancelOperation, SQLOperation.cancel is invoked in a different thread from that running the query to close/destroy its encapsulated Driver object. Both SQLOperation and Driver are not thread-safe which could sometimes result in Runtime exceptions like NPE. The errors from the running query are not handled properly therefore probably causing some stuffs (files, locks etc) not being cleaned after the query termination. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14874) Master: Update errata.txt for missing JIRA nubmer in HIVE-9423 commit msg
Chaoyu Tang created HIVE-14874: -- Summary: Master: Update errata.txt for missing JIRA nubmer in HIVE-9423 commit msg Key: HIVE-14874 URL: https://issues.apache.org/jira/browse/HIVE-14874 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Trivial Missing the JIRA number in commit msg for master branch, see See https://issues.apache.org/jira/browse/HIVE-9423?focusedCommentId=15537841&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15537841 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14930) RuntimeException was seen in explainanalyze_3.q test log
Chaoyu Tang created HIVE-14930: -- Summary: RuntimeException was seen in explainanalyze_3.q test log Key: HIVE-14930 URL: https://issues.apache.org/jira/browse/HIVE-14930 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Priority: Minor When working on HIVE-14799, I noticed there were some RuntimeException when running explainanalyze_3.q and explainanalyze_5.q, though these tests shew successful. {code} 016-10-10T19:02:48,455 ERROR [aa5c6743-b5de-40fc-82da-5dde0e6b387f main] ql.Driver: FAILED: Hive Internal Error: java.lang.RuntimeException(Cannot overwrite read-only table: src) java.lang.RuntimeException: Cannot overwrite read-only table: src at org.apache.hadoop.hive.ql.hooks.EnforceReadOnlyTables.run(EnforceReadOnlyTables.java:74) at org.apache.hadoop.hive.ql.hooks.EnforceReadOnlyTables.run(EnforceReadOnlyTables.java:56) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1736) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1505) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1218) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1208) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:106) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:251) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:504) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1298) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1436) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1218) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1208) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1319) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1293) at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173) at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) at org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver(TestMiniTezCliDriver.java:59) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:92) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runners.Suite.runChild(Suite.java:127) at org.junit.runners.Suite.runChild(Suite.java:26) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:73) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:367
[jira] [Created] (HIVE-15043) HMS supports Oracle 12c as its backend database
Chaoyu Tang created HIVE-15043: -- Summary: HMS supports Oracle 12c as its backend database Key: HIVE-15043 URL: https://issues.apache.org/jira/browse/HIVE-15043 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Chaoyu Tang Assignee: Chaoyu Tang HMS does not work with Oracle 12c using its JDBC driver ojdbc7 (12.1.0.2 or 12.1.0.1) in any Hive versions prior to 2.0. It hangs when it connects to Oracle 12c due to an issue from Datanecleus 3.2 with ojdbc7. With DN upgraded to 4.2.x in Hive 2.0 (see HIVE-6113), we need find out if its HMS supports 12c and its ojdbc7 drivers or not. If not, we should find a way in Hive to make it supported if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15059) Flaky test: TestSemanticAnalysis#testAlterTableRename
Chaoyu Tang created HIVE-15059: -- Summary: Flaky test: TestSemanticAnalysis#testAlterTableRename Key: HIVE-15059 URL: https://issues.apache.org/jira/browse/HIVE-15059 Project: Hive Issue Type: Sub-task Components: HCatalog, Tests Reporter: Chaoyu Tang The default database location in testAlterTableRename is not that as specified by TEST_WAREHOUSE_DIR in HCatBaseTest. it looks like following in the precommit build: {code} pfile:/home/hiveptest/104.197.110.94-hiveptest-0/apache-github-source-source/hcatalog/core/target/warehouse {code} But the TEST_WAREHOUSE_DIR should actually be like: {code} file:/home/hiveptest/104.197.110.94-hiveptest-0/apache-github-source-source/hcatalog/core/build/test/data/org.apache.hive.hcatalog.mapreduce.HCatBaseTest-1477389203834/warehouse/oldname {code} It only happened in the precommit build but not the local environment. We need investigate the issue since it fails with HIVE-14909. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15091) Master: Update errata.txt for the missing JIRA number in HIVE-14909 commit msg
Chaoyu Tang created HIVE-15091: -- Summary: Master: Update errata.txt for the missing JIRA number in HIVE-14909 commit msg Key: HIVE-15091 URL: https://issues.apache.org/jira/browse/HIVE-15091 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.2.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Trivial Missing the JIRA number in commit msg for master branch, see https://issues.apache.org/jira/browse/HIVE-14909?focusedCommentId=15614056&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15614056 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15109) Set MaxPermSize to 256M for maven tests
Chaoyu Tang created HIVE-15109: -- Summary: Set MaxPermSize to 256M for maven tests Key: HIVE-15109 URL: https://issues.apache.org/jira/browse/HIVE-15109 Project: Hive Issue Type: Test Components: Test Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Trying to run the qtests, for example, mvn test -Dtest=TestMiniTezCliDriver -Dqfile=explainanalyze_1.q and got {code} Running org.apache.hadoop.hive.cli.TestMiniTezCliDriver Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.591 sec - in org.apache.hadoop.hive.cli.TestMiniTezCliDriver {code} Looking into the hive.log, and found that it was due to too small PermGen space: {code} 2016-11-01T19:52:19,039 ERROR [org.apache.hadoop.util.JvmPauseMonitor$Monitor@261e733f] server.NIOServerCnxnFactory: Thread Thread[org.apache.hadoop.util.JvmPauseMonitor$Monitor@261e733f,5,main] died java.lang.OutOfMemoryError: PermGen space {code} Setting env MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=256M" would not help. We can set MaxPermSize to maven.test.jvm.args in pom.xml instead: {code} -Xmx2048m -XX:MaxPermSize=256M {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15341) Get work path instead of attempted task path in HiveHFileOutputFormat
Chaoyu Tang created HIVE-15341: -- Summary: Get work path instead of attempted task path in HiveHFileOutputFormat Key: HIVE-15341 URL: https://issues.apache.org/jira/browse/HIVE-15341 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor It would be more robust to use FileOutputCommitter.getWorkPath instead of FileOutputCommitter.getTaskAttemptPath. The getTaskAttemptPath is same as getWorkPath in MR2 new APIs but is missing in MR1 old APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15410) WebHCat supports get/set table property with its name containing period and hyphen
Chaoyu Tang created HIVE-15410: -- Summary: WebHCat supports get/set table property with its name containing period and hyphen Key: HIVE-15410 URL: https://issues.apache.org/jira/browse/HIVE-15410 Project: Hive Issue Type: Improvement Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Hive table properties could have period (.) or hyphen (-) in their names, auto.purge is one of the examples. But WebHCat APIs does not support either set or get these properties, and they throw out the error msg ""Invalid DDL identifier :property". For example: {code} [root@ctang-1 ~]# curl -s 'http://ctang-1.gce.cloudera.com:7272/templeton/v1/ddl/database/default/table/sample_07/property/prop.key1?user.name=hiveuser' {"error":"Invalid DDL identifier :property"} [root@ctang-1 ~]# curl -s -X PUT -HContent-type:application/json -d '{ "value": "true" }' 'http://ctang-1.gce.cloudera.com:7272/templeton/v1/ddl/database/default/table/sample_07/property/prop.key2?user.name=hiveuser/' {"error":"Invalid DDL identifier :property"} {code} This patch is going to add the supports to the property name containing period and/or hyphen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15446) Hive fails in recursive debug
Chaoyu Tang created HIVE-15446: -- Summary: Hive fails in recursive debug Key: HIVE-15446 URL: https://issues.apache.org/jira/browse/HIVE-15446 Project: Hive Issue Type: Bug Components: Diagnosability Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor When running hive recursive debug mode, for example, ./bin/hive --debug:port=10008,childSuspend=y It fails with error msg: -- ERROR: Cannot load this JVM TI agent twice, check your java command line for duplicate jdwp options.Error occurred during initialization of VM agent library failed to init: jdwp -- It is because HADOOP_OPTS and HADOOP_CLIENT_OPTS both have jvm debug options when invoking HADOOP.sh for the child process. The HADOOP_CLIENT_OPTS is appended to HADOOP_OPTS in HADOOP.sh which leads to the duplicated debug options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15485) Investigate the DoAs failure in HoS
Chaoyu Tang created HIVE-15485: -- Summary: Investigate the DoAs failure in HoS Key: HIVE-15485 URL: https://issues.apache.org/jira/browse/HIVE-15485 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang With DoAs enabled, HoS failed with following errors: {code} Exception in thread "main" org.apache.hadoop.security.AccessControlException: systest tries to renew a token with renewer hive at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:484) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7543) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:555) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:674) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:999) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135) {code} It is related to the change from HIVE-14383. It looks like that SparkSubmit logs in Kerberos with passed in hive principal/keytab and then tries to create a hdfs delegation token for user systest with renewer hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15742) Column stats should be preserved when it is renamed
Chaoyu Tang created HIVE-15742: -- Summary: Column stats should be preserved when it is renamed Key: HIVE-15742 URL: https://issues.apache.org/jira/browse/HIVE-15742 Project: Hive Issue Type: Improvement Components: Statistics Reporter: Chaoyu Tang Assignee: Chaoyu Tang Currently, when a column is renamed, its stats is delete. Recreating it could be expensive and we need preserve it if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15815) Allow to pass some Oozie properties to Spark in HoS
Chaoyu Tang created HIVE-15815: -- Summary: Allow to pass some Oozie properties to Spark in HoS Key: HIVE-15815 URL: https://issues.apache.org/jira/browse/HIVE-15815 Project: Hive Issue Type: Improvement Components: Diagnosability, Spark Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Oozie passes some of its properties (e.g. oozie.job.id) to Beeline/HS2 when it invokes Hive2 action. If we allow these properties to be passed to Spark in HoS, we can easily associate an Ooize workflow ID to an HoS client and Spark job in Spark history. It will be very helpful in diagnosing some issues involving Oozie Hive2/HoS/Spark. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15966) Query column alias fails in order by
Chaoyu Tang created HIVE-15966: -- Summary: Query column alias fails in order by Key: HIVE-15966 URL: https://issues.apache.org/jira/browse/HIVE-15966 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Chaoyu Tang Assignee: Chaoyu Tang Query: {code} select mtg.marketing_type_group_desc as marketing_type_group from marketing_type_group mtg order by mtg.marketing_type_group_desc; {code} fails with error: {code} 2017-02-17T11:22:11,441 ERROR [eb89eafb-e100-42b1-8ff1-b3332b2e715f main]: ql.Driver (SessionState.java:printError(1116)) - FAILED: SemanticException [Error 10004]: Line 7:9 Invalid table alias or column reference 'marketing_type_group_desc': (possible column names are: marketing_type_group, prod_type) org.apache.hadoop.hive.ql.parse.SemanticException: Line 7:9 Invalid table alias or column reference 'marketing_type_group_desc': (possible column names are: marketing_type_group, prod_type) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11501) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11449) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11417) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11395) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7761) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9655) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9554) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10450) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10328) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:11011) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:478) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11022) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:285) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:514) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1319) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1459) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1239) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1229) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16019) Query fails when group by/order by on same column with uppercase name
Chaoyu Tang created HIVE-16019: -- Summary: Query fails when group by/order by on same column with uppercase name Key: HIVE-16019 URL: https://issues.apache.org/jira/browse/HIVE-16019 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Chaoyu Tang Assignee: Chaoyu Tang Query with group by/order by on same column KEY failed: {code} SELECT T1.KEY AS MYKEY FROM SRC T1 GROUP BY T1.KEY ORDER BY T1.KEY LIMIT 3; {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16071) Spark remote driver misuses the timeout in RPC handshake
Chaoyu Tang created HIVE-16071: -- Summary: Spark remote driver misuses the timeout in RPC handshake Key: HIVE-16071 URL: https://issues.apache.org/jira/browse/HIVE-16071 Project: Hive Issue Type: Bug Components: Spark Reporter: Chaoyu Tang Assignee: Chaoyu Tang Based on its property description in HiveConf and the comments in HIVE-12650 (https://issues.apache.org/jira/browse/HIVE-12650?focusedCommentId=15128979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15128979), hive.spark.client.connect.timeout is the timeout when the spark remote driver makes a socket connection (channel) to RPC server. But currently it is also used by the remote driver for RPC client/server handshaking, which is not right. Instead, hive.spark.client.server.connect.timeout should be used and it has already been used by the RPCServer in the handshaking. The error like following is usually caused by this issue, since the default hive.spark.client.connect.timeout value (1000ms) used by remote driver for handshaking is a little too short. {code} 17/02/20 08:46:08 ERROR yarn.ApplicationMaster: User class threw exception: java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: Client closed before SASL negotiation finished. java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: Client closed before SASL negotiation finished. at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) at org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:156) at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542) Caused by: javax.security.sasl.SaslException: Client closed before SASL negotiation finished. at org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(Rpc.java:453) at org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(SaslHandler.java:90) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
Chaoyu Tang created HIVE-16147: -- Summary: Rename a partitioned table should not drop its partition columns stats Key: HIVE-16147 URL: https://issues.apache.org/jira/browse/HIVE-16147 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang When a partitioned table (e.g. sample_pt) is renamed (e.g to sample_pt_rename), describing its partition shows that the partition column stats are still accurate, but actually they all have been dropped. It could be reproduce as following: 1. analyze table sample_pt compute statistics for columns; 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS for all columns are true {code} ... # Detailed Partition Information Partition Value:[3] Database: default Table: sample_pt CreateTime: Fri Jan 20 15:42:30 EST 2017 LastAccessTime: UNKNOWN Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 Partition Parameters: COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} last_modified_byctang last_modified_time 1485217063 numFiles1 numRows 100 rawDataSize 5143 totalSize 5243 transient_lastDdlTime 1488842358 ... {code} 3: describe formatted default.sample_pt partition (dummy = 3) salary: column stats exists {code} # col_name data_type min max num_nulls distinct_count avg_col_len max_col_len num_trues num_falses comment salary int 1 151370 0 94 from deserializer {code} 4. alter table sample_pt rename to sample_pt_rename; 5. describe formatted default.sample_pt_rename partition (dummy = 3): describe the rename table partition (dummy =3) shows that COLUMN_STATS for columns are still true. {code} # Detailed Partition Information Partition Value:[3] Database: default Table: sample_pt_rename CreateTime: Fri Jan 20 15:42:30 EST 2017 LastAccessTime: UNKNOWN Location: file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 Partition Parameters: COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} last_modified_byctang last_modified_time 1485217063 numFiles1 numRows 100 rawDataSize 5143 totalSize 5243 transient_lastDdlTime 1488842358 {code} describe formatted default.sample_pt_rename partition (dummy = 3) salary: the column stats have been dropped. {code} # col_name data_type comment salary int from deserializer Time taken: 0.131 seconds, Fetched: 3 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16189) Table column stats might be invalidated in a failed table rename
Chaoyu Tang created HIVE-16189: -- Summary: Table column stats might be invalidated in a failed table rename Key: HIVE-16189 URL: https://issues.apache.org/jira/browse/HIVE-16189 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang If the table rename does not succeed due to its failure in moving the data to the new renamed table folder, the changes in TAB_COL_STATS are not rolled back which leads to invalid column stats. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16394) HoS does not support queue name change in middle of session
Chaoyu Tang created HIVE-16394: -- Summary: HoS does not support queue name change in middle of session Key: HIVE-16394 URL: https://issues.apache.org/jira/browse/HIVE-16394 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang The mapreduce.job.queuename only effects when HoS executes its query first time. After that, changing mapreduce.job.queuename won't change the query yarn scheduler queue name. -- This message was sent by Atlassian JIRA (v6.3.15#6346)