from:"Chaoyu Tang"

[jira] [Created] (HIVE-9920) DROP DATABASE IF EXISTS throws exception if database does not exist

2015-03-10 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-9920:
-

 Summary: DROP DATABASE IF EXISTS throws exception if database does 
not exist
 Key: HIVE-9920
 URL: https://issues.apache.org/jira/browse/HIVE-9920
 Project: Hive
  Issue Type: Bug
  Components: Logging, Metastore
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


drop database if exists noexistingdb throws and logs full exception if the 
database (noexistingdb) does not exist:
15/03/10 22:47:22 WARN metastore.ObjectStore: Failed to get database statsdb2, 
returning NoSuchObjectException
15/03/11 00:19:55 ERROR metastore.RetryingHMSHandler: 
NoSuchObjectException(message:statsdb2)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:569)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
at com.sun.proxy.$Proxy6.getDatabase(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:953)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:927)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy8.get_database(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1150)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91)
at com.sun.proxy.$Proxy9.getDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1291)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:1364)
at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDropDatabase(DDLSemanticAnalyzer.java:777)
at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:427)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:425)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1116)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1164)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1053)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1043)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9975) Renaming a nonexisting partition should not throw out NullPointerException

2015-03-16 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-9975:
-

 Summary: Renaming a nonexisting partition should not throw out 
NullPointerException
 Key: HIVE-9975
 URL: https://issues.apache.org/jira/browse/HIVE-9975
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


Renaming a nonexisting partition should not throw out NullPointerException. 
create table testpart (col1 int, col2 string, col3 string) partitioned by (part 
string);
alter table testpart partition (part = 'nonexisting') rename to partition (part 
= 'existing');
we get NPE like following:
{code}
15/03/16 10:16:11 ERROR exec.DDLTask: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.DDLTask.renamePartition(DDLTask.java:944)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:350)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1642)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1402)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1187)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1053)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1043)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)

FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. null
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10007) Support qualified table name in analyze table compute statistics for columns

2015-03-18 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10007:
--

 Summary: Support qualified table name in analyze table compute 
statistics for columns
 Key: HIVE-10007
 URL: https://issues.apache.org/jira/browse/HIVE-10007
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, Statistics
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Currently "analyze table compute statistics for columns" command can not 
compute column stats for a table in a different database since it does not 
support qualified table name. You need switch to that table database in order 
to compute its column stats. For example, you have to "use psqljira", then 
"analyze table src compute statistics for columns" for the table src under 
psqljira.
This JIRA will provide the support to qualified table name in analyze column 
stats command.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10210) Compute partition column stats fails when partition value is zero-leading integer

2015-04-03 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10210:
--

 Summary: Compute partition column stats fails when partition value 
is zero-leading integer
 Key: HIVE-10210
 URL: https://issues.apache.org/jira/browse/HIVE-10210
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


The command "Analyze table .. partition compute statistics for columns" fails 
if the partition value is not a normalize integer with leading zeros. For 
example:
create table colstatspartint (key int, value string) partitioned by (part int);
insert into colstatspartint partition (part='0003') select key, value from src 
limit 30;
analyze table colstatspartint partition (part='0003') compute statistics for 
columns; or
analyze table colstatspartint partition (part=0003) compute statistics for 
columns;
you will get the error:
{code}
15/04/03 10:13:19 ERROR metastore.RetryingHMSHandler: 
NoSuchObjectException(message:Partition for which stats is gathered doesn't 
exist.)
at 
org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatistics(ObjectStore.java:5952)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
at com.sun.proxy.$Proxy6.updatePartitionColumnStatistics(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.update_partition_column_statistics(HiveMetaStore.java:4346)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:5678)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10231) Compute partition column stats fails if partition col type is date

2015-04-06 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10231:
--

 Summary: Compute partition column stats fails if partition col 
type is date
 Key: HIVE-10231
 URL: https://issues.apache.org/jira/browse/HIVE-10231
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Fix For: 1.2.0


Currently the command "analyze table .. partition .. compute statistics for 
columns" may only work for partition column type of string, numeric types, but 
not others like date. See following case using date as partition coltype:
{code}
create table colstatspartdate (key int, value string) partitioned by (ds date, 
hr int);
insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select key, 
value from src limit 20;
analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute 
statistics for columns;
{code}
you will get RuntimeException:
{code}
FAILED: RuntimeException Cannot convert to Date from: int
15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to 
Date from: int
java.lang.RuntimeException: Cannot convert to Date from: int
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048)
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242)

{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10307) Support to use number literals in partition column

2015-04-10 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10307:
--

 Summary: Support to use number literals in partition column
 Key: HIVE-10307
 URL: https://issues.apache.org/jira/browse/HIVE-10307
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Data types like TinyInt, SmallInt, BigInt or Decimal can be expressed as 
literals with postfix like Y, S, L, or BD appended to the number. These 
literals work in most Hive queries, but do not when they are used as partition 
column value. For a partitioned table like:
create table partcoltypenum (key int, value string) partitioned by (tint 
tinyint, sint smallint, bint bigint);
insert into partcoltypenum partition (tint=100Y, sint=1S, 
bint=1000L) select key, value from src limit 30;

Queries like select, describe and drop partition do not work. For an example
select * from partcoltypenum where tint=100Y and sint=1S and 
bint=1000L;
does not return any rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10313) Literal Decimal ExprNodeConstantDesc should contain value of HiveDecimal instead of String

2015-04-13 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10313:
--

 Summary: Literal Decimal ExprNodeConstantDesc should contain value 
of HiveDecimal instead of String
 Key: HIVE-10313
 URL: https://issues.apache.org/jira/browse/HIVE-10313
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


In TyepCheckProcFactory.NumExprProcessor, the ExprNodeConstantDesc is created 
from strVal:
{code}
else if (expr.getText().endsWith("BD")) {
  // Literal decimal
  String strVal = expr.getText().substring(0, expr.getText().length() - 
2);
  HiveDecimal hd = HiveDecimal.create(strVal);
  int prec = 1;
  int scale = 0;
  if (hd != null) {
prec = hd.precision();
scale = hd.scale();
  }
  DecimalTypeInfo typeInfo = TypeInfoFactory.getDecimalTypeInfo(prec, 
scale);
  return new ExprNodeConstantDesc(typeInfo, strVal);
} 
{code}
It should use HiveDecmal:
return new ExprNodeConstantDesc(typeInfo, hd);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10322) TestJdbcWithMiniHS2.testNewConnectionConfiguration fails

2015-04-13 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10322:
--

 Summary: TestJdbcWithMiniHS2.testNewConnectionConfiguration fails
 Key: HIVE-10322
 URL: https://issues.apache.org/jira/browse/HIVE-10322
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Trivial


Fix test 
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testNewConnectionConfiguration failed 
with following error:
{code}
org.apache.hive.service.cli.HiveSQLException: Failed to open new session: 
org.apache.hive.service.cli.HiveSQLException: 
java.lang.IllegalArgumentException: hive configuration 
hive.server2.thrift.http.max.worker.threads does not exists.
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:243)
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:234)
at 
org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:513)
at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:188)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:233)
at 
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testNewConnectionConfiguration(TestJdbcWithMiniHS2.java:275)
Caused by: org.apache.hive.service.cli.HiveSQLException: Failed to open new 
session: org.apache.hive.service.cli.HiveSQLException: 
java.lang.IllegalArgumentException: hive configuration 
hive.server2.thrift.http.max.worker.threads does not exists.
{code}

It seems related to HIVE-10271(remove 
hive.server2.thrift.http.min/max.worker.threads properties) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10362) Support Type check/conversion in dynamic partition column

2015-04-16 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10362:
--

 Summary: Support Type check/conversion in dynamic partition column
 Key: HIVE-10362
 URL: https://issues.apache.org/jira/browse/HIVE-10362
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, Types
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


There are quite a lot of issues associated with the non-noramlized or 
type-mismatched values for partition column. Hive has many ways to introduce 
such problematic data. HIVE-10307 mainly provides the support to type 
check/convert/normalize the partition column value in static partition 
specification. This JIRA tries to deal with the partition column type  in 
dynamic partition insert. Currently any data can be inserted as a partition 
column value as long as it is quoted as a string. For example,
create table dynparttypechecknum (key int, value string) partitioned by (part 
int);
insert into dynparttypechecknum partition (part) select key, value, '1' 
from src limit 1;
show partitions dynparttypechecknum;
--
part=1

The partition column value is non-normalized int 1. It causes some 
unnecessary problems such as integer partition column JDO filter pushdown (see 
HIVE-6052) and others like HIVE-10210.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10363) Provide a way to normalize the legacy partition column values

2015-04-16 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10363:
--

 Summary: Provide a way to normalize the legacy partition column 
values
 Key: HIVE-10363
 URL: https://issues.apache.org/jira/browse/HIVE-10363
 Project: Hive
  Issue Type: Improvement
  Components: Types
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


We have seen a lot issues which were caused by the non-normalized partition 
column values, such as HIVE-10210, HIVE-6052 etc. Besides type checking, 
converting and normalizing the partition column values in insert/alter 
partition operations (see HIVE-10307, HIVE-10362), we need provide an easy way 
for users to normalize their legacy partition column data. HIVE-5700 has 
attempted to do at metastore sql level, but given that many flavors of backend 
databases and different versions, it is quite hard to achieve that and it is 
also error prone. The sql portion of change in HIVE-5700 has been reverted by 
HIVE-9445/HIVE-9509.
Currently alter table .. partition ... rename could be used to normalize the 
partition column for each partition, but I am thinking any other more 
convenient and better way to do that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10541) Beeline requires newline at the end of each query in a file

2015-04-29 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10541:
--

 Summary: Beeline requires newline at the end of each query in a 
file
 Key: HIVE-10541
 URL: https://issues.apache.org/jira/browse/HIVE-10541
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 0.13.1
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


Beeline requires newline at the end of each query in a file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10571) HiveMetaStoreClient should close existing thrift connection before its reconnect

2015-05-01 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10571:
--

 Summary: HiveMetaStoreClient should close existing thrift 
connection before its reconnect
 Key: HIVE-10571
 URL: https://issues.apache.org/jira/browse/HIVE-10571
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


HiveMetaStoreClient should first close its existing thrift connection, no 
matter it is already dead or still live, before its opening another connection 
in its reconnect() method. Otherwise, it might lead to resource huge 
accumulation or leak at HMS site when client keeps on retrying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10587) ExprNodeColumnDesc should be created with isPartitionColOrVirtualCol true for DP column

2015-05-03 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10587:
--

 Summary: ExprNodeColumnDesc should be created with 
isPartitionColOrVirtualCol true for DP column
 Key: HIVE-10587
 URL: https://issues.apache.org/jira/browse/HIVE-10587
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


In SymenticAnalyzer method: 
Operator genConversionSelectOperator(String dest, QB qb, Operator input, 
TableDesc table_desc, DynamicPartitionCtx dpCtx) throws SemanticException
==
The DP column's ExprNodeColumnDesc is created by passing false as the parameter 
isPartitionColOrVirtualCol value:
{code}
  // DP columns starts with tableFields.size()
  for (int i = tableFields.size() + (updating() ? 1 : 0); i < 
rowFields.size(); ++i) {
TypeInfo rowFieldTypeInfo = rowFields.get(i).getType();
ExprNodeDesc column = new ExprNodeColumnDesc(
rowFieldTypeInfo, rowFields.get(i).getInternalName(), "", false);
expressions.add(column);
  }
{code}
I think it should be true instead. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()

2015-05-05 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10620:
--

 Summary: ZooKeeperHiveLock overrides equal() method but not 
hashcode()
 Key: HIVE-10620
 URL: https://issues.apache.org/jira/browse/HIVE-10620
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


ZooKeeperHiveLock overrides the public boolean equals(Object o) method but does 
not for public int hashCode(). It violates the Java contract that equal and may 
cause unexpected results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10835) Concurrency issues in JDBC driver

2015-05-27 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10835:
--

 Summary: Concurrency issues in JDBC driver
 Key: HIVE-10835
 URL: https://issues.apache.org/jira/browse/HIVE-10835
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 1.2.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Though JDBC specification specifies that "Each Connection object can create 
multiple Statement objects that may be used concurrently by the program", but 
that does not work in current Hive JDBC driver. In addition, there also exist  
race conditions between DatabaseMetaData, Statement and ResultSet as long as 
they make RPC calls to HS2 using same Thrift transport, which happens within a 
connection.
So we need a connection level lock to serialize all these RPC calls in a 
connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10976) Redundant HiveMetaStore connect check in HS2 CLIService start

2015-06-10 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10976:
--

 Summary: Redundant HiveMetaStore connect check in HS2 CLIService 
start
 Key: HIVE-10976
 URL: https://issues.apache.org/jira/browse/HIVE-10976
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Trivial


During HS2 startup, CLIService start() does a HMS connection test to HMS.
It is redundant, since in its init stage, CLIService calls 
applyAuthorizationConfigPolicy where it starts a sessionState and establishes a 
connection to HMS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10977) No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled

2015-06-10 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-10977:
--

 Summary: No need to instantiate MetaStoreDirectSql when HMS 
DirectSql is disabled
 Key: HIVE-10977
 URL: https://issues.apache.org/jira/browse/HIVE-10977
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


When hive.metastore.try.direct.sql is set to false, HMS will use JDO to 
retrieve data, therefor it is not necessary to instantiate an expensive 
MetaStoreDirectSql during ObjectStore initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11100) Beeline should escape semi-colon in queries

2015-06-24 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-11100:
--

 Summary: Beeline should escape semi-colon in queries
 Key: HIVE-11100
 URL: https://issues.apache.org/jira/browse/HIVE-11100
 Project: Hive
  Issue Type: Improvement
  Components: Beeline
Affects Versions: 1.2.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


Beeline should escape the semicolon in queries. for example, the query like 
followings:
CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY ';' LINES TERMINATED BY '\n';
or 
CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY '\;' LINES TERMINATED BY '\n';
both failed.
But the 2nd query with semicolon escaped with "\" works in CLI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11157) Hive.get(HiveConf) returns same Hive object to different user sessions

2015-06-30 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-11157:
--

 Summary: Hive.get(HiveConf) returns same Hive object to different 
user sessions
 Key: HIVE-11157
 URL: https://issues.apache.org/jira/browse/HIVE-11157
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.3.0, 2.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Current Hive.get(HiveConf) creates and returns a new Hive object when the 
ThreadLocal Hive is null or HMS config is not compatible, but does not do that 
when it is called in a thread which has been switched to execute a session with 
a different userId. It will cause the impersonation issue to HMS.
It is related to HIVE-7890.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11666) Discrepency in INSERT OVERWRITE LOCAL DIRECTORY between Beeline and CLI

2015-08-27 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-11666:
--

 Summary: Discrepency in INSERT OVERWRITE LOCAL DIRECTORY between 
Beeline and CLI
 Key: HIVE-11666
 URL: https://issues.apache.org/jira/browse/HIVE-11666
 Project: Hive
  Issue Type: Sub-task
  Components: CLI, HiveServer2
Reporter: Chaoyu Tang


Hive CLI writes to local host when INSERT OVERWRITE LOCAL DIRECTORY. But 
Beeline writes to HS2 local directory. For a user migrating from CLI to 
Beeline, it might be a big chance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11667) Support Trash and Snapshot in Truncate Table

2015-08-27 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-11667:
--

 Summary: Support Trash and Snapshot in Truncate Table
 Key: HIVE-11667
 URL: https://issues.apache.org/jira/browse/HIVE-11667
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


Currently Truncate Table (or Partition) is implemented using FileSystem.delete 
and then recreate the directory. It does not support HDFS Trash if it is turned 
on. The table/partition can not be truncated if it has a snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables

2015-09-10 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-11786:
--

 Summary: Deprecate the use of redundant column in colunm stats 
related tables
 Key: HIVE-11786
 URL: https://issues.apache.org/jira/browse/HIVE-11786
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns 
such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have 
foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. 
These redundant columns violate database normalization rules and cause a lot of 
inconvenience (sometimes difficult) in column stats related feature 
implementation. For example, when renaming a table, we have to update 
TABLE_NAME column in these tables as well which is unnecessary.

This JIRA is first to deprecate the use of these columns at HMS code level. A 
followed JIRA is to be opened to focus on DB schema change and upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11787) Remove the redundant columns in TAB_COL_STATS and PART_COL_STATS

2015-09-10 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-11787:
--

 Summary: Remove the redundant columns in TAB_COL_STATS and 
PART_COL_STATS
 Key: HIVE-11787
 URL: https://issues.apache.org/jira/browse/HIVE-11787
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


After HIVE-11786 deprecates the use of redundant columns in TAB_COL_STATS and 
PART_COL_STATS at HMS code level, the column DB_NAME/TABLE_NAME in 
TAB_COL_STATS and DB_NAME/TABLE_NAME/PARTITION_NAME in PART_COL_STATS are 
useless and should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11788) Column stats should be preserved after db/table/partitoin rename

2015-09-10 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-11788:
--

 Summary: Column stats should be preserved after db/table/partitoin 
rename
 Key: HIVE-11788
 URL: https://issues.apache.org/jira/browse/HIVE-11788
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Statistics
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Currently we simply delete the column stats after renaming a database, table, 
or partition since there was not an easy way in HMS to update the DB_NAME, 
TABLE_NAME and PARTITION_NAME in TAB_COL_STATS and PART_COL_STATS. With the 
removal of these redundant columns in these tables (HIVE-11786), we will still 
keep column stats in the operation which is not to change a column name or type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11926) NPE could occur in collectStatistics when column type is varchar

2015-09-22 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-11926:
--

 Summary: NPE could occur in collectStatistics when column type is 
varchar
 Key: HIVE-11926
 URL: https://issues.apache.org/jira/browse/HIVE-11926
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Statistics
Affects Versions: 1.2.1
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


If column stats is calculated and populated to HMS from its client like Impala 
etc, the column type name stored in TAB_COL_STATS/PART_COL_STATS could be in 
uppercase (e.g. VARCHAR, DECIMAL). When Hive collects stats for these columns 
during optimization (with hive.stats.fetch.column.stats set to true), it will 
throw out NPE. See error message like below:
{code}
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling 
statement: FAILED: NullPointerException null
at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:103)
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:172)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:379)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:366)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException: null
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:636)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:623)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:180)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:136)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:124)
truncated
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11941) Update committer list

2015-09-23 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-11941:
--

 Summary: Update committer list
 Key: HIVE-11941
 URL: https://issues.apache.org/jira/browse/HIVE-11941
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Priority: Minor


Please update the committer list in http://hive.apache.org/people.html:
---
Name: Chaoyu Tang
Apache ID: ctang
Organization: Cloudera (www.cloudera.com)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11964) RelOptHiveTable.hiveColStatsMap might contain mismatched column stats

2015-09-25 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-11964:
--

 Summary: RelOptHiveTable.hiveColStatsMap might contain mismatched 
column stats
 Key: HIVE-11964
 URL: https://issues.apache.org/jira/browse/HIVE-11964
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Statistics
Affects Versions: 1.2.1
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


RelOptHiveTable.hiveColStatsMap might contain mismatched stats since it was 
built by assuming the stats returned from
==
hiveColStats =StatsUtils.getTableColumnStats(hiveTblMetadata, 
hiveNonPartitionCols, nonPartColNamesThatRqrStats);
or 
HiveMetaStoreClient.getTableColumnStatistics(dbName, tableName, colNames)
==
have the same order of the requested columns. But actually the order is 
non-deterministic. therefore the returned stats should be re-ordered before it 
is put in hiveColStatsMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11995) Remove repetitively setting permissions in insert/load overwrite partition

2015-09-30 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-11995:
--

 Summary: Remove repetitively setting permissions in insert/load 
overwrite partition
 Key: HIVE-11995
 URL: https://issues.apache.org/jira/browse/HIVE-11995
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


When hive.warehouse.subdir.inherit.perms is set to true, insert/load overwrite 
.. partition set table and partition permissions repetitively which is not 
necessary and causing performance issue especially in the cases where there are 
multiple levels of partitions involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12053) Stats performance regression caused by HIVE-11786

2015-10-07 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12053:
--

 Summary: Stats performance regression caused by HIVE-11786
 Key: HIVE-12053
 URL: https://issues.apache.org/jira/browse/HIVE-12053
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Chaoyu Tang


HIVE-11786 tried to normalize table TAB_COL_STATS/PART_COL_STATS but caused 
performance regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12188) DoAs does not work properly in non-kerberos secured HS2

2015-10-15 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12188:
--

 Summary: DoAs does not work properly in non-kerberos secured HS2
 Key: HIVE-12188
 URL: https://issues.apache.org/jira/browse/HIVE-12188
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


The case with following settings is valid but it seems still not work correctly 
in current HS2
==
hive.server2.authentication=NONE (or LDAP)
hive.server2.enable.doAs= true
hive.metastore.sasl.enabled=true (with HMS Kerberos enabled)
==
Currently HS2 is able to fetch the delegation token to a kerberos secured HMS 
only when itself is also kerberos secured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12218) Unable to create a like table for an hbase backed table

2015-10-20 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12218:
--

 Summary: Unable to create a like table for an hbase backed table
 Key: HIVE-12218
 URL: https://issues.apache.org/jira/browse/HIVE-12218
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.1
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


For an HBase backed table:
{code}
CREATE TABLE hbasetbl (key string, state string, country string, country_id int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = "info:state,info:country,info:country_id"
);
{code}
Create its like table using query such as 
create table hbasetbl_like like hbasetbl;

It fails with error:
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. 
org.apache.hadoop.hive.ql.metadata.HiveException: must specify an InputFormat 
class





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12245) Support column comments for an HBase backed table

2015-10-23 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12245:
--

 Summary: Support column comments for an HBase backed table
 Key: HIVE-12245
 URL: https://issues.apache.org/jira/browse/HIVE-12245
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


Currently the column comments of an HBase backed table are always returned as 
"from deserializer". For example,
{code}
CREATE TABLE hbasetbl 
(key string comment 'It is key', 
state string comment 'It is state', 
country string comment 'It is country', 
country_id int comment 'It is country_id')
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = "info:state,info:country,info:country_id"
);
hive> describe hbasetbl;
key string  from deserializer   
state   string  from deserializer   
country string  from deserializer   
country_id  int from deserializer  
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12248) The rawStore used in DBTokenStore should be thread-safe

2015-10-23 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12248:
--

 Summary: The rawStore used in DBTokenStore should be thread-safe
 Key: HIVE-12248
 URL: https://issues.apache.org/jira/browse/HIVE-12248
 Project: Hive
  Issue Type: Bug
  Components: Authentication
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


A non-thread-safe implementation of RawStore, particularly ObjectStore, set in 
DBTokenStore is being shared by multi-threads, which causes the race condition 
in DataNuclues to access the backend DB. 
The DN PersistenceManager(PM) in ObjectStore is not thread safe, so 
DBTokenStore should use a ThreadLocal ObjectStore.
Following errors might be root caused by the race condition in DN PM.
{code}
Object of type "org.apache.hadoop.hive.metastore.model.MDelegationToken" is 
detached. Detached objects cannot be used with this operation.
org.datanucleus.exceptions.ObjectDetachedException: Object of type 
"org.apache.hadoop.hive.metastore.model.MDelegationToken" is detached. Detached 
objects cannot be used with this operation.
at 
org.datanucleus.ExecutionContextImpl.assertNotDetached(ExecutionContextImpl.java:5728)
at 
org.datanucleus.ExecutionContextImpl.retrieveObject(ExecutionContextImpl.java:1859)
at 
org.datanucleus.ExecutionContextThreadedImpl.retrieveObject(ExecutionContextThreadedImpl.java:203)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.jdoRetrieve(JDOPersistenceManager.java:605)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.retrieveAll(JDOPersistenceManager.java:693)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.retrieveAll(JDOPersistenceManager.java:713)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getAllTokenIdentifiers(ObjectStore.java:6517)
 
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12259) Command containing semicolon is broken in Beeline

2015-10-24 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12259:
--

 Summary: Command containing semicolon is broken in Beeline
 Key: HIVE-12259
 URL: https://issues.apache.org/jira/browse/HIVE-12259
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


The Beeline command (!cmd) containing semicolon is broken. 
For example:
!connect jdbc:hive2://localhost:10001/default;principal=hive/xyz@realm.com
is broken because the included ";" makes it not to run with 
execCommandWithPrefix as a whole command.
{code}
  if (line.startsWith(COMMAND_PREFIX) && !line.contains(";")) {
// handle the case "!cmd" for beeline
return execCommandWithPrefix(line);
  } else {
return commands.sql(line, getOpts().getEntireLineAsCommand());
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12270) Add DBTokenStore support to HS2 delegation token

2015-10-26 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12270:
--

 Summary: Add DBTokenStore support to HS2 delegation token
 Key: HIVE-12270
 URL: https://issues.apache.org/jira/browse/HIVE-12270
 Project: Hive
  Issue Type: New Feature
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


DBTokenStore was initially introduced by HIVE-3255 in Hive-0.12 and it is 
mainly for HMS delegation token. Later in Hive-0.13, the HS2 delegation token 
support was introduced by HIVE-5155 but it used MemoryTokenStore as token 
store. That the HIVE-9622 uses the shared RawStore (or HMSHandler) to access 
the token/keys information in HMS DB directly from HS2 seems not the right 
approach to support DBTokenStore in HS2. I think we should use 
HiveMetaStoreClient in HS2 instead.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12306) hbase_queries.q fails in Hive 1.3.0

2015-10-30 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12306:
--

 Summary: hbase_queries.q fails in Hive 1.3.0
 Key: HIVE-12306
 URL: https://issues.apache.org/jira/browse/HIVE-12306
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Trivial


hbase_queries.q is failing (only in version 1.3.0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12346) Internally used variables in HiveConf should not be settable via command

2015-11-05 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12346:
--

 Summary: Internally used variables in HiveConf should not be 
settable via command
 Key: HIVE-12346
 URL: https://issues.apache.org/jira/browse/HIVE-12346
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 1.2.1
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Some HiveConf variables such as hive.added.jars.path are only for internal use 
and should not be settable via set command. 
We saw a lot of cases that users mistakenly set these variables using set 
command despite some of them have been documented as "internal parameter" in 
Hive. The command usually succeeds but it sometimes does not effect, which 
causes some confusions. For example, the hive.added.jars.path can be set via 
set command but it is sometimes overridden by session resource jars during 
runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12365) Added resource path is sent to cluster as an empty string when externally removed

2015-11-07 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12365:
--

 Summary: Added resource path is sent to cluster as an empty string 
when externally removed
 Key: HIVE-12365
 URL: https://issues.apache.org/jira/browse/HIVE-12365
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Sometimes the resources (e.g. jar) added via command like "add jars " 
are removed externally from their filepath for some reasons. Their paths are 
sent to cluster as empty strings which causes the failures to the query that 
even do not need these jars in execution. The error look like as following:
{code}
15/11/06 21:56:44 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
file:/tmp/hadoop-ctang/mapred/staging/ctang734817191/.staging/job_local734817191_0003
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
at org.apache.hadoop.fs.Path.(Path.java:135)
at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:215)
at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12505) Insert overwrite in same encrypted zone silently fails to remove some existing files

2015-11-23 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12505:
--

 Summary: Insert overwrite in same encrypted zone silently fails to 
remove some existing files
 Key: HIVE-12505
 URL: https://issues.apache.org/jira/browse/HIVE-12505
 Project: Hive
  Issue Type: Bug
  Components: Encryption
Affects Versions: 1.2.1
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


With HDFS Trash enabled but its encryption zone lower than Hive data directory, 
insert overwrite command silently fails to trash the existing files during 
overwrite, which could lead to unexpected incorrect results (more rows returned 
than expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12566) Incorrect result returns when using COALESCE in WHERE condition with LEFT JOIN

2015-12-02 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12566:
--

 Summary: Incorrect result returns when using COALESCE in WHERE 
condition with LEFT JOIN
 Key: HIVE-12566
 URL: https://issues.apache.org/jira/browse/HIVE-12566
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 0.13.0
Reporter: Chaoyu Tang
Priority: Critical


The left join query with on/where clause returns incorrect result (more rows 
are returned). See the reproducible sample below.
Left table with data:
{code}
CREATE TABLE ltable (i int, la int, lk1 string, lk2 string) ROW FORMAT 
DELIMITED FIELDS TERMINATED BY ',';
---
1,\N,CD5415192314304,00071
2,\N,CD5415192225530,00071
{code}
Right  table with data:
{code}
CREATE TABLE rtable (ra int, rk1 string, rk2 string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',';
---
1,CD5415192314304,00071
45,CD5415192314304,00072
{code}
Query:
{code}
SELECT * FROM ltable l LEFT OUTER JOIN rtable r on (l.lk1 = r.rk1 AND l.lk2 = 
r.rk2) WHERE COALESCE(l.la,'EMPTY')=COALESCE(r.ra,'EMPTY');
{code}
Result returns:
{code}
1   NULLCD5415192314304 00071   NULLNULLNULL
2   NULLCD5415192225530 00071   NULLNULLNULL
{code}
The correct result should be
{code}
2   NULLCD5415192225530 00071   NULLNULLNULL
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12607) Hive fails on zero length sequence files

2015-12-07 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12607:
--

 Summary: Hive fails on zero length sequence files
 Key: HIVE-12607
 URL: https://issues.apache.org/jira/browse/HIVE-12607
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.2.1
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Flume will, at times, generate zero length sequence files which cause the Hive 
query failure.
To reproduce the issue:
{code}
> create external table test (id string)
partitioned by (system string,date string)
STORED AS SEQUENCEFILE
LOCATION '/user/me/test';

hadoop fs -mkdir /user/me/test/logs
hadoop fs -mkdir /user/me/test/logs/date=2014
hadoop fs -touchz /user/me/test/logs/date=2014/a.txt

hive
-> ALTER TABLE test ADD PARTITION (system = 'logs',date='2014') location 
'/user/me/test/logs/date=2014';
-> select * from test t1,test t2 where t1.id = t2.id;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12713) Miscellaneous improvements in driver compile and execute logging

2015-12-18 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12713:
--

 Summary: Miscellaneous improvements in driver compile and execute 
logging
 Key: HIVE-12713
 URL: https://issues.apache.org/jira/browse/HIVE-12713
 Project: Hive
  Issue Type: Improvement
  Components: Logging
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


Miscellaneous compile and execute logging improvements include:
1. ensuring that only the redacted query to be logged out
2. removing redundant variable substitution in HS2 SQLOperation
3. logging out the query and its compilation time without having to enable 
PerfLogger debug, to help identify badly written queries which take a lot of 
time to compile and probably cause other good queries to be queued (HIVE-12516)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12812) Enable mapred.input.dir.recursive by default to support union with aggregate function

2016-01-08 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12812:
--

 Summary: Enable mapred.input.dir.recursive by default to support 
union with aggregate function
 Key: HIVE-12812
 URL: https://issues.apache.org/jira/browse/HIVE-12812
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1, 2.1.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


When union remove optimization is enabled, union query with aggregate function 
writes its subquery intermediate results to subdirs which needs 
mapred.input.dir.recursive to be enabled in order to be fetched. This property 
is not defined by default in Hive and often ignored by user, which causes the 
query failure and is hard to be debugged.
So we need set mapred.input.dir.recursive to true whenever union remove 
optimization is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12840) stats optimize subqueries whenever possible in a union query

2016-01-11 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12840:
--

 Summary: stats optimize subqueries whenever possible in a union 
query
 Key: HIVE-12840
 URL: https://issues.apache.org/jira/browse/HIVE-12840
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Reporter: Chaoyu Tang


HIVE-12788 addressed a data incorrect issue in union query with aggregate 
function when stats optimization is enabled. It won't stats optimize a query if 
any of its subqueries can not be. 
[~pxiong] suggested an enhancement to leverage the stats optimizer whenever 
possible (even only for a branch of a union), and we need investigate the 
possible solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12901) Incorrect result from select '\\' like '\\%'

2016-01-21 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12901:
--

 Summary: Incorrect result from select '\\' like '\\%'
 Key: HIVE-12901
 URL: https://issues.apache.org/jira/browse/HIVE-12901
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.1
        Reporter: Chaoyu Tang


The query returns false. Actually MySQL also returns 0 which I do not think it 
right.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12965) Insert overwrite local directory should perserve the overwritten directory permission

2016-01-29 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-12965:
--

 Summary: Insert overwrite local directory should perserve the 
overwritten directory permission
 Key: HIVE-12965
 URL: https://issues.apache.org/jira/browse/HIVE-12965
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


In Hive, "insert overwrite local directory" first deletes the overwritten 
directory if exists, recreate a new one, then copy the files from src directory 
to the new local directory. This process sometimes changes the permissions of 
the to-be-overwritten local directory, therefore causing some applications no 
more to be able to access its content.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13082) Enable constant propagation optimization in query with left semi join

2016-02-17 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-13082:
--

 Summary: Enable constant propagation optimization in query with 
left semi join
 Key: HIVE-13082
 URL: https://issues.apache.org/jira/browse/HIVE-13082
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Currently constant folding is only allowed for inner or unique join, I think it 
is also applicable and allowed for left semi join. Otherwise the query like 
following having multiple joins with left semi joins will fail:
{code} 
select table1.id, table1.val, table2.val2 from table1 inner join table2 on 
table1.val = 't1val01' and table1.id = table2.id left semi join table3 on 
table1.dimid = table3.id;
{code}
with errors:
{code}
java.lang.Exception: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
~[hadoop-mapreduce-client-common-2.6.0.jar:?]
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
[hadoop-mapreduce-client-common-2.6.0.jar:?]
Caused by: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
~[hadoop-common-2.6.0.jar:?]
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
~[hadoop-common-2.6.0.jar:?]
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
~[hadoop-common-2.6.0.jar:?]
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446) 
~[hadoop-mapreduce-client-core-2.6.0.jar:?]
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
~[hadoop-mapreduce-client-core-2.6.0.jar:?]
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
 ~[hadoop-mapreduce-client-common-2.6.0.jar:?]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
~[?:1.7.0_45]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
~[?:1.7.0_45]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
~[?:1.7.0_45]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
~[?:1.7.0_45]
at java.lang.Thread.run(Thread.java:744) ~[?:1.7.0_45]
...
Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[?:1.7.0_45]
at java.util.ArrayList.get(ArrayList.java:411) ~[?:1.7.0_45]
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.(StandardStructObjectInspector.java:109)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:326)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:311)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:181)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:319)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:78)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.initializeOp(MapJoinOperator.java:138)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:355) 
~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:504) 
~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13164) Predicate pushdown may cause cross-product in left semi join

2016-02-25 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-13164:
--

 Summary: Predicate pushdown may cause cross-product in left semi 
join
 Key: HIVE-13164
 URL: https://issues.apache.org/jira/browse/HIVE-13164
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


For some left semi join queries like followings:
select count(1) from (select value from t1 where key = 0) t1 left semi join 
(select value from t2 where key = 0) t2 on t2.value = 'val_0';
or 
select count(1) from (select value from t1 where key = 0) t1 left semi join 
(select value from t2 where key = 0) t2 on t1.value = 'val_0';
Their plans show that they have been converted to keyless cross-product due to 
the predicate pushdown and the dropping of the on condition.
{code}
LOGICAL PLAN:
t1:t1 
  TableScan (TS_0)
alias: t1
Statistics: Num rows: 1453 Data size: 5812 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator (FIL_18)
  predicate: (key = 0) (type: boolean)
  Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator (SEL_2)
Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column 
stats: NONE
Reduce Output Operator (RS_9)
  sort order: 
  Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE 
Column stats: NONE
  Join Operator (JOIN_11)
condition map:
 Left Semi Join 0 to 1
keys:
  0 
  1 
Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE 
Column stats: NONE
Group By Operator (GBY_13)
  aggregations: count(1)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
  Reduce Output Operator (RS_14)
sort order: 
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
value expressions: _col0 (type: bigint)
Group By Operator (GBY_15)
  aggregations: count(VALUE._col0)
  mode: mergepartial
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
  File Output Operator (FS_17)
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
table:
input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
t2:t2 
  TableScan (TS_3)
alias: t2
Statistics: Num rows: 645 Data size: 5812 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator (FIL_19)
  predicate: ((key = 0) and (value = 'val_0')) (type: boolean)
  Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator (SEL_5)
Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column 
stats: NONE
Group By Operator (GBY_8)
  keys: 'val_0' (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator (RS_10)
sort order: 
Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE 
Column stats: NONE
Join Operator (JOIN_11)
  condition map:
   Left Semi Join 0 to 1
  keys:
0 
1 
  Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE 
Column stats: NONE
{code}
[~gopalv], do you think these plans are valid or not? Thanks 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13243) Hive drop table on encyption zone fails for external tables

2016-03-09 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-13243:
--

 Summary: Hive drop table on encyption zone fails for external 
tables
 Key: HIVE-13243
 URL: https://issues.apache.org/jira/browse/HIVE-13243
 Project: Hive
  Issue Type: Bug
  Components: Encryption, Metastore
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


When dropping an external table with its data located in an encryption zone, 
hive should not throw out MetaException(message:Unable to drop table because it 
is in an encryption zone and trash is enabled. Use PURGE option to skip trash.) 
in checkTrashPurgeCombination since the data should not get deleted (or 
trashed) anyway regardless HDFS Trash is enabled or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13294) AvroSerde leaks the connection in a case when reading schema from a url

2016-03-19 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-13294:
--

 Summary: AvroSerde leaks the connection in a case when reading 
schema from a url
 Key: HIVE-13294
 URL: https://issues.apache.org/jira/browse/HIVE-13294
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


AvroSerde leaks the connection in a case when reading schema from url:
In 
public static Schema determineSchemaOrThrowException {
...
return AvroSerdeUtils.getSchemaFor(new URL(schemaString).openStream());
...
}
The opened inputStream is never closed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13401) Kerberized HS2 with LDAP auth enabled fails the delegation token authentication

2016-03-31 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-13401:
--

 Summary: Kerberized HS2 with LDAP auth enabled fails the 
delegation token authentication 
 Key: HIVE-13401
 URL: https://issues.apache.org/jira/browse/HIVE-13401
 Project: Hive
  Issue Type: Bug
  Components: Authentication
Reporter: Chaoyu Tang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13509) HCatalog getSplits should ignore the partition with invalid path

2016-04-13 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-13509:
--

 Summary: HCatalog getSplits should ignore the partition with 
invalid path
 Key: HIVE-13509
 URL: https://issues.apache.org/jira/browse/HIVE-13509
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


It is quite common that there is the discrepancy between partition directory 
and its HMS metadata, simply because the directory could be added/deleted 
externally using hdfs shell command. Technically it should be fixed by MSCK and 
alter table .. add/drop command etc, but sometimes it might not be practical 
especially in a multi-tenant env. This discrepancy does not cause any problem 
to Hive, Hive returns no rows for a partition with an invalid (e.g. 
non-existing) path, but it fails the Pig load with HCatLoader, because the 
HCatBaseInputFormat getSplits throws an error when getting a split for a 
non-existing path. The error message might looks like:
{code}
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not 
exist: hdfs://xyz.com:8020/user/hive/warehouse/xyz/date=2016-01-01/country=BR
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at 
org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:162)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13588) NPE is thrown from MapredLocalTask.executeInChildVM

2016-04-21 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-13588:
--

 Summary: NPE is thrown from MapredLocalTask.executeInChildVM
 Key: HIVE-13588
 URL: https://issues.apache.org/jira/browse/HIVE-13588
 Project: Hive
  Issue Type: Bug
  Components: Logging
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


NPE was thrown out from MapredLocalTask.executeInChildVM in running some 
queries with CLI, see error below:
{code}
  java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInChildVM(MapredLocalTask.java:321)
 [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.execute(MapredLocalTask.java:148)
 [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:172) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1868) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1595) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1346) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1117) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1105) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:782) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.7.0_45]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
~[?:1.7.0_45]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.7.0_45]
{code}
It is because the operationLog is only applicable to HS2 but CLI, therefore it 
might not be set (null)
It is related to HIVE-13183



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13590) Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case

2016-04-21 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-13590:
--

 Summary: Kerberized HS2 with LDAP auth enabled fails in 
multi-domain LDAP case
 Key: HIVE-13590
 URL: https://issues.apache.org/jira/browse/HIVE-13590
 Project: Hive
  Issue Type: Bug
  Components: Authentication, Security
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


In a kerberized HS2 with LDAP authentication enabled, LDAP user usually logs in 
using username in form of username@domain in LDAP multi-domain case. But it 
fails if the domain was not in the Hadoop auth_to_local mapping rule, the error 
is as following:
{code}
Caused by: 
org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No 
rules applied to ct...@mydomain.com
at 
org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389)
at org.apache.hadoop.security.User.(User.java:48)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13748) TypeInfoParser cannot handle the dash in the field name of a complex type

2016-05-12 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-13748:
--

 Summary: TypeInfoParser cannot handle the dash in the field name 
of a complex type
 Key: HIVE-13748
 URL: https://issues.apache.org/jira/browse/HIVE-13748
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


hive> create table y(col struct<`a-b`:double> COMMENT 'type field has a dash');
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.IllegalArgumentException: 
Error: : expected at the position 8 of 'struct' but '-' is found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13953) Issues in HiveLockObject equals method

2016-06-06 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-13953:
--

 Summary: Issues in HiveLockObject equals method
 Key: HIVE-13953
 URL: https://issues.apache.org/jira/browse/HIVE-13953
 Project: Hive
  Issue Type: Bug
  Components: Locking
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


There are two issues in equals method in HiveLockObject:
{code}
  @Override
  public boolean equals(Object o) {
if (!(o instanceof HiveLockObject)) {
  return false;
}

HiveLockObject tgt = (HiveLockObject) o;
return Arrays.equals(pathNames, tgt.pathNames) &&
data == null ? tgt.getData() == null :
tgt.getData() != null && data.equals(tgt.getData());
  }
{code}
1. Arrays.equals(pathNames, tgt.pathNames) might return false for the same path 
in HiveLockObject since in current Hive, the pathname components might be 
stored in two ways, taking a dynamic partition path db/tbl/part1/part2 as an 
example, it might be stored in the pathNames as an array of four elements, db, 
tbl, part1, and part2 or as an array only having one element 
db/tbl/part1/part2. It will be safer to comparing the pathNames using 
StringUtils.equals(this.getName(), tgt.getName())
2. The comparison logic is not right.
{code}
  @Override
  public boolean equals(Object o) {
if (!(o instanceof HiveLockObject)) {
  return false;
}

HiveLockObject tgt = (HiveLockObject) o;
return StringUtils.equals(this.getName(), tgt.getName()) &&
(data == null ? tgt.getData() == null : data.equals(tgt.getData()));
  }
{code}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13959) MoveTask should only release its query associated locks

2016-06-06 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-13959:
--

 Summary: MoveTask should only release its query associated locks
 Key: HIVE-13959
 URL: https://issues.apache.org/jira/browse/HIVE-13959
 Project: Hive
  Issue Type: Bug
  Components: Locking
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


releaseLocks in MoveTask releases all locks under a HiveLockObject pathNames. 
But some of locks under this pathNames might be for other queries and should 
not be released.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13975) Hive import table fails if there is no write access to the source location

2016-06-08 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-13975:
--

 Summary: Hive import table fails if there is no write access to 
the source location
 Key: HIVE-13975
 URL: https://issues.apache.org/jira/browse/HIVE-13975
 Project: Hive
  Issue Type: Bug
  Components: Import/Export
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


It seems not right that a write permission is needed on the source side for 
import table because the CopyTask in import needs to create a staging directory 
under the imported source directory. For a user who does not have the write 
permission to the source directory, you will get error like following
{code}
Caused by: java.lang.RuntimeException: Cannot create staging directory 
'hdfs://quickstart.cloudera:8020/user/hive/exp_t1/.hive-staging_hive_2016-05-26_16-38-29_453_8739265934924968327-1':
 Permission denied: user=test1, access=WRITE, 
inode="/user/hive/exp_t1":anonymous:supergroup:drwxrwxr-x
 ...
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:952)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:945)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1856)
at org.apache.hadoop.hive.common.FileUtils.mkdir(FileUtils.java:518)
at org.apache.hadoop.hive.ql.Context.getStagingDir(Context.java:234)
... 23 more
{code}
There are three tasks involved in import table, CopyTask, DDLTask and MoveTask. 
I wonder if the CopyTask is really needed?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14161) from_utc_timestamp()/to_utc_timestamp return incorrect results with EST

2016-07-05 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14161:
--

 Summary: from_utc_timestamp()/to_utc_timestamp return incorrect 
results with EST
 Key: HIVE-14161
 URL: https://issues.apache.org/jira/browse/HIVE-14161
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


{code}
hive> SELECT to_utc_timestamp('2016-06-30 06:00:00', 'PST'); 
OK
2016-06-30 13:00:00  ==>Correct, UTC is 7 hours ahead of PST
Time taken: 1.674 seconds, Fetched: 1 row(s)
hive> SELECT to_utc_timestamp('2016-06-30 08:00:00', 'CST');
OK
2016-06-30 13:00:00  ==>Correct, UTC is 5 hours ahead of CST
Time taken: 1.776 seconds, Fetched: 1 row(s)
hive> SELECT to_utc_timestamp('2016-06-30 09:00:00', 'EST');
OK
2016-06-30 14:00:00  ==>Wrong, UTC should be 4 hours ahead of EST
Time taken: 1.686 seconds, Fetched: 1 row(s)
hive> select from_utc_timestamp('2016-06-30 14:00:00', 'EST');
OK
2016-06-30 09:00:00  ==>Wrong, UTC should be 4 hours ahead of EST
{code}
It might be something related to daylight savings time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14173) NPE was thrown after enabling directsql in the middle of session

2016-07-06 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14173:
--

 Summary: NPE was thrown after enabling directsql in the middle of 
session
 Key: HIVE-14173
 URL: https://issues.apache.org/jira/browse/HIVE-14173
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


hive.metastore.try.direct.sql is initially set to false in HMS hive-site.xml, 
then changed to true using set metaconf command in the middle of a session, 
running a query will be thrown NPE with error message is as following:
{code}
2016-07-06T17:44:41,489 ERROR [pool-5-thread-2]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invokeInternal(192)) - 
MetaException(message:java.lang.NullPointerException)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5741)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rethrowException(HiveMetaStore.java:4771)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:4754)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy18.get_partitions_by_expr(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:12048)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:12032)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.(ObjectStore.java:2667)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetListHelper.(ObjectStore.java:2825)
at 
org.apache.hadoop.hive.metastore.ObjectStore$4.(ObjectStore.java:2410)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:2410)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:2400)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
at com.sun.proxy.$Proxy17.getPartitionsByExpr(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:4749)
... 20 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14281) Issue in decimal multiplication

2016-07-19 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14281:
--

 Summary: Issue in decimal multiplication
 Key: HIVE-14281
 URL: https://issues.apache.org/jira/browse/HIVE-14281
 Project: Hive
  Issue Type: Bug
  Components: Types
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


{code}
CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18));
INSERT OVERWRITE TABLE test VALUES (20, 20);
SELECT a*b from test
{code}
The returned result is NULL (instead of 400)
It is because Hive adds the scales from operands and the type for a*b is set to 
decimal (38, 36). Hive could not handle this case properly (e.g. by rounding)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14298) NPE could be thrown in HMS when an ExpressionTree could not be made from a filter

2016-07-20 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14298:
--

 Summary: NPE could be thrown in HMS when an ExpressionTree could 
not be made from a filter
 Key: HIVE-14298
 URL: https://issues.apache.org/jira/browse/HIVE-14298
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


In many cases where an ExpressionTree could not be made from a filter (e.g. 
parser fails to parse a filter etc.) and its value is null. But this null is 
passed around and used by a couple of HMS methods which can cause 
NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14347) Inconsistent hehavior in decimal multiplication

2016-07-26 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14347:
--

 Summary: Inconsistent hehavior in decimal multiplication
 Key: HIVE-14347
 URL: https://issues.apache.org/jira/browse/HIVE-14347
 Project: Hive
  Issue Type: Bug
  Components: Types
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


1. select cast('20' as decimal(38,18)) * cast('10' as decimal(38,18)) from test;
returns 200, but the type of multiplication result is decimal (38,36) as shown 
in the query plan. 
2.  select a*b from atable where column a and b with both column type of 
decimal (38,18) and column value 20 and 10 respectively, we get result NULL but 
type decimal (38, 36).
--
If we strictly follow current precision/scale manipulations for the decimal 
multiplication in Hive, the result in case1 400 has already exceeded the data 
range that decimal (38, 36) supports and it should return null. 
Current Hive deduces the precision/scale from constant values (10 and 20) and 
use them (2, 0) instead of the specified precision/scale (38, 18) in the 
multiplication. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14359) Spark fails might fail in LDAP authentication in kerberized cluster

2016-07-27 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14359:
--

 Summary: Spark fails might fail in LDAP authentication in 
kerberized cluster
 Key: HIVE-14359
 URL: https://issues.apache.org/jira/browse/HIVE-14359
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


When HS2 is used as a gateway for the LDAP users to access and run the queries 
in kerborized cluster, it's authentication mode is configured as LDAP and at 
this time, HoS might fail by the same reason as HIVE-10594. 
hive.server2.authentication is not a proper property to determine if a cluster 
is kerberized, instead hadoop.security.authentication should be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14395) Add the missing data files to Avro union tests (HIVE-14205 addendum)

2016-07-31 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14395:
--

 Summary: Add the missing data files to Avro union tests 
(HIVE-14205 addendum)
 Key: HIVE-14395
 URL: https://issues.apache.org/jira/browse/HIVE-14395
 Project: Hive
  Issue Type: Bug
  Components: Test
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Trivial


The union_non_nullable.txt & union_nullable.txt were not checked in for 
HIVE-14205. It was my mistake.
It is the reason that testCliDriver_avro_nullable_union & 
testNegativeCliDriver_avro_non_nullable_union are failing in current pre-commit 
build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14457) Partitions in encryption zone are still trashed though an exception is returned

2016-08-07 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14457:
--

 Summary: Partitions in encryption zone are still trashed though an 
exception is returned
 Key: HIVE-14457
 URL: https://issues.apache.org/jira/browse/HIVE-14457
 Project: Hive
  Issue Type: Bug
  Components: Encryption, Metastore
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


drop_partition_common in HiveMetaStore still drops partitions in encryption 
zone without PURGE even through it returns an exception. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14615) Temp table leaves behind insert command

2016-08-23 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14615:
--

 Summary: Temp table leaves behind insert command
 Key: HIVE-14615
 URL: https://issues.apache.org/jira/browse/HIVE-14615
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


{code}
create table test (key int, value string);
insert into test values (1, 'val1');
show tables;
test
values__tmp__table__1
{code}
the temp table values__tmp__table__1 was resulted from insert into ...values
and exists until logout the session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14626) Support Trash in Truncate Table

2016-08-24 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14626:
--

 Summary: Support Trash in Truncate Table
 Key: HIVE-14626
 URL: https://issues.apache.org/jira/browse/HIVE-14626
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


Currently Truncate Table (or Partition) is implemented using FileSystem.delete 
and then recreate the directory, so
1. it does not support HDFS Trash
2. if the table/partition directory is initially encryption protected, after 
being deleted and recreated, it is no more protected.
The new implementation is to clean the contents of directory using 
multi-threaded trashFiles. If Trash is enabled and has a lower encryption level 
than the data directory, the files under it will be deleted. Otherwise, they 
will be Trashed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14697) Can not access kerberized HS2 Web UI

2016-09-02 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14697:
--

 Summary: Can not access kerberized HS2 Web UI
 Key: HIVE-14697
 URL: https://issues.apache.org/jira/browse/HIVE-14697
 Project: Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.1.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Failed to access kerberized HS2 WebUI with following error msg:
{code}
curl -v -u : --negotiate http://util185.phx2.cbsig.net:10002/ 
> GET / HTTP/1.1 
> Host: util185.phx2.cbsig.net:10002 
> Authorization: Negotiate YIIU7...[redacted]... 
> User-Agent: curl/7.42.1 
> Accept: */* 
> 
< HTTP/1.1 413 FULL head 
< Content-Length: 0 
< Connection: close 
< Server: Jetty(7.6.0.v20120127) 
{code}
It is because the Jetty default request header (4K) is too small in some 
kerberos case.
So this patch is to increase the request header to 64K.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14774) Canceling query using Ctrl-C in beeline might lead to stale locks

2016-09-16 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14774:
--

 Summary: Canceling query using Ctrl-C in beeline might lead to 
stale locks
 Key: HIVE-14774
 URL: https://issues.apache.org/jira/browse/HIVE-14774
 Project: Hive
  Issue Type: Bug
  Components: Locking
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Terminating a running query using Ctrl-C in Beeline might lead to stale locks 
since the process running the query might still be able to acquire the locks 
but fail to release them after the query terminate abnormally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14799) Query operation are not thread safe during its cancellation

2016-09-20 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14799:
--

 Summary: Query operation are not thread safe during its 
cancellation
 Key: HIVE-14799
 URL: https://issues.apache.org/jira/browse/HIVE-14799
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


When a query is cancelled either via Beeline (Ctrl-C) or API call 
TCLIService.Client.CancelOperation, SQLOperation.cancel is invoked in a 
different thread from that running the query to close/destroy its encapsulated 
Driver object. Both SQLOperation and Driver are not thread-safe which could 
sometimes result in Runtime exceptions like NPE. The errors from the running 
query are not handled properly therefore probably causing some stuffs (files, 
locks etc) not being cleaned after the query termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14874) Master: Update errata.txt for missing JIRA nubmer in HIVE-9423 commit msg

2016-10-01 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14874:
--

 Summary: Master: Update errata.txt for missing JIRA nubmer in 
HIVE-9423 commit msg
 Key: HIVE-14874
 URL: https://issues.apache.org/jira/browse/HIVE-14874
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Trivial


Missing the JIRA number in commit msg for master branch, see
See 
https://issues.apache.org/jira/browse/HIVE-9423?focusedCommentId=15537841&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15537841



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14930) RuntimeException was seen in explainanalyze_3.q test log

2016-10-11 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-14930:
--

 Summary: RuntimeException was seen in explainanalyze_3.q test log
 Key: HIVE-14930
 URL: https://issues.apache.org/jira/browse/HIVE-14930
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Priority: Minor


When working on HIVE-14799, I noticed there were some RuntimeException when 
running explainanalyze_3.q and explainanalyze_5.q, though these tests shew 
successful.
{code}
016-10-10T19:02:48,455 ERROR [aa5c6743-b5de-40fc-82da-5dde0e6b387f main] 
ql.Driver: FAILED: Hive Internal Error: java.lang.RuntimeException(Cannot 
overwrite read-only table: src)
java.lang.RuntimeException: Cannot overwrite read-only table: src
at 
org.apache.hadoop.hive.ql.hooks.EnforceReadOnlyTables.run(EnforceReadOnlyTables.java:74)
at 
org.apache.hadoop.hive.ql.hooks.EnforceReadOnlyTables.run(EnforceReadOnlyTables.java:56)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1736)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1505)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1218)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1208)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:106)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:251)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:504)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1298)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1436)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1218)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1208)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1319)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1293)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
at 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver(TestMiniTezCliDriver.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:92)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runners.Suite.runChild(Suite.java:127)
at org.junit.runners.Suite.runChild(Suite.java:26)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:73)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:367

[jira] [Created] (HIVE-15043) HMS supports Oracle 12c as its backend database

2016-10-24 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-15043:
--

 Summary: HMS supports Oracle 12c as its backend database
 Key: HIVE-15043
 URL: https://issues.apache.org/jira/browse/HIVE-15043
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


HMS does not work with Oracle 12c using its JDBC driver ojdbc7 (12.1.0.2 or 
12.1.0.1) in any Hive versions prior to 2.0. It hangs when it connects to 
Oracle 12c due to an issue from Datanecleus 3.2 with ojdbc7. With DN upgraded 
to 4.2.x in Hive 2.0 (see HIVE-6113), we need find out if its HMS supports 12c 
and its ojdbc7 drivers or not. If not, we should find a way in Hive to make it 
supported if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15059) Flaky test: TestSemanticAnalysis#testAlterTableRename

2016-10-25 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-15059:
--

 Summary: Flaky test: TestSemanticAnalysis#testAlterTableRename
 Key: HIVE-15059
 URL: https://issues.apache.org/jira/browse/HIVE-15059
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog, Tests
Reporter: Chaoyu Tang


The default database location in testAlterTableRename is not that as specified 
by TEST_WAREHOUSE_DIR in HCatBaseTest. it looks like following in the precommit 
build:
{code}
pfile:/home/hiveptest/104.197.110.94-hiveptest-0/apache-github-source-source/hcatalog/core/target/warehouse
{code}
But the TEST_WAREHOUSE_DIR should actually be like:
{code}
file:/home/hiveptest/104.197.110.94-hiveptest-0/apache-github-source-source/hcatalog/core/build/test/data/org.apache.hive.hcatalog.mapreduce.HCatBaseTest-1477389203834/warehouse/oldname
{code}
It only happened in the precommit build but not the local environment. We need 
investigate the issue since it fails with HIVE-14909.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15091) Master: Update errata.txt for the missing JIRA number in HIVE-14909 commit msg

2016-10-28 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-15091:
--

 Summary: Master: Update errata.txt for the missing JIRA number in 
HIVE-14909 commit msg
 Key: HIVE-15091
 URL: https://issues.apache.org/jira/browse/HIVE-15091
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.2.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Trivial


Missing the JIRA number in commit msg for master branch, see 
https://issues.apache.org/jira/browse/HIVE-14909?focusedCommentId=15614056&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15614056



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15109) Set MaxPermSize to 256M for maven tests

2016-11-01 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-15109:
--

 Summary: Set MaxPermSize to 256M for maven tests
 Key: HIVE-15109
 URL: https://issues.apache.org/jira/browse/HIVE-15109
 Project: Hive
  Issue Type: Test
  Components: Test
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


Trying to run the qtests, for example,
mvn test -Dtest=TestMiniTezCliDriver -Dqfile=explainanalyze_1.q
and got 
{code}
Running org.apache.hadoop.hive.cli.TestMiniTezCliDriver
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.591 sec - in 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver
{code}
Looking into the hive.log, and found that it was due to too small PermGen space:
{code}
2016-11-01T19:52:19,039 ERROR 
[org.apache.hadoop.util.JvmPauseMonitor$Monitor@261e733f] 
server.NIOServerCnxnFactory: Thread 
Thread[org.apache.hadoop.util.JvmPauseMonitor$Monitor@261e733f,5,main] died
java.lang.OutOfMemoryError: PermGen space
{code}
Setting env MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=256M"  would not help. 
We can set MaxPermSize to maven.test.jvm.args in pom.xml instead:
{code}
-Xmx2048m -XX:MaxPermSize=256M
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15341) Get work path instead of attempted task path in HiveHFileOutputFormat

2016-12-02 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-15341:
--

 Summary: Get work path instead of attempted task path in 
HiveHFileOutputFormat
 Key: HIVE-15341
 URL: https://issues.apache.org/jira/browse/HIVE-15341
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


It would be more robust to use FileOutputCommitter.getWorkPath instead of 
FileOutputCommitter.getTaskAttemptPath.
The getTaskAttemptPath is same as getWorkPath in MR2 new APIs but is missing in 
MR1 old APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15410) WebHCat supports get/set table property with its name containing period and hyphen

2016-12-09 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-15410:
--

 Summary: WebHCat supports get/set table property with its name 
containing period and hyphen
 Key: HIVE-15410
 URL: https://issues.apache.org/jira/browse/HIVE-15410
 Project: Hive
  Issue Type: Improvement
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


Hive table properties could have period (.) or hyphen (-) in their names, 
auto.purge is one of the examples. But WebHCat APIs does not support either set 
or get these properties, and they throw out the error msg ""Invalid DDL 
identifier :property". For example:
{code}
[root@ctang-1 ~]# curl -s 
'http://ctang-1.gce.cloudera.com:7272/templeton/v1/ddl/database/default/table/sample_07/property/prop.key1?user.name=hiveuser'
{"error":"Invalid DDL identifier :property"}
[root@ctang-1 ~]# curl -s -X PUT -HContent-type:application/json -d '{ "value": 
"true" }' 
'http://ctang-1.gce.cloudera.com:7272/templeton/v1/ddl/database/default/table/sample_07/property/prop.key2?user.name=hiveuser/'
{"error":"Invalid DDL identifier :property"}
{code}
This patch is going to add the supports to the property name containing period 
and/or hyphen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15446) Hive fails in recursive debug

2016-12-16 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-15446:
--

 Summary: Hive fails in recursive debug
 Key: HIVE-15446
 URL: https://issues.apache.org/jira/browse/HIVE-15446
 Project: Hive
  Issue Type: Bug
  Components: Diagnosability
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


When running hive recursive debug mode, for example,
./bin/hive --debug:port=10008,childSuspend=y
It fails with error msg:
--
ERROR: Cannot load this JVM TI agent twice, check your java command line for 
duplicate jdwp options.Error occurred during initialization of VM
agent library failed to init: jdwp
--
It is because HADOOP_OPTS and HADOOP_CLIENT_OPTS both have jvm debug options 
when invoking HADOOP.sh for the child process. The HADOOP_CLIENT_OPTS is 
appended to HADOOP_OPTS in HADOOP.sh which leads to the duplicated debug 
options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15485) Investigate the DoAs failure in HoS

2016-12-21 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-15485:
--

 Summary: Investigate the DoAs failure in HoS
 Key: HIVE-15485
 URL: https://issues.apache.org/jira/browse/HIVE-15485
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


With DoAs enabled, HoS failed with following errors:
{code}
Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
systest tries to renew a token with renewer hive
at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:484)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7543)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:555)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:674)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:999)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)
{code}
It is related to the change from HIVE-14383. It looks like that SparkSubmit 
logs in Kerberos with passed in hive principal/keytab and then tries to create 
a hdfs delegation token for user systest with renewer hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15742) Column stats should be preserved when it is renamed

2017-01-26 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-15742:
--

 Summary: Column stats should be preserved when it is renamed
 Key: HIVE-15742
 URL: https://issues.apache.org/jira/browse/HIVE-15742
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Currently, when a column is renamed, its stats is delete. Recreating it could 
be expensive and we need preserve it if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15815) Allow to pass some Oozie properties to Spark in HoS

2017-02-04 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-15815:
--

 Summary: Allow to pass some Oozie properties to Spark in HoS
 Key: HIVE-15815
 URL: https://issues.apache.org/jira/browse/HIVE-15815
 Project: Hive
  Issue Type: Improvement
  Components: Diagnosability, Spark
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


Oozie passes some of its properties (e.g. oozie.job.id) to Beeline/HS2 when it 
invokes Hive2 action. If we allow these properties to be passed to Spark in 
HoS, we can easily associate an Ooize workflow ID to an HoS client and Spark 
job in Spark history. It will be very helpful in diagnosing some issues 
involving Oozie Hive2/HoS/Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-15966) Query column alias fails in order by

2017-02-17 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-15966:
--

 Summary: Query column alias fails in order by
 Key: HIVE-15966
 URL: https://issues.apache.org/jira/browse/HIVE-15966
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Query:  
{code}
select mtg.marketing_type_group_desc as marketing_type_group
from marketing_type_group mtg 
order by mtg.marketing_type_group_desc;
{code}
fails with error:
{code}
2017-02-17T11:22:11,441 ERROR [eb89eafb-e100-42b1-8ff1-b3332b2e715f main]: 
ql.Driver (SessionState.java:printError(1116)) - FAILED: SemanticException 
[Error 10004]: Line 7:9 Invalid table alias or column reference 
'marketing_type_group_desc': (possible column names are: marketing_type_group, 
prod_type)
org.apache.hadoop.hive.ql.parse.SemanticException: Line 7:9 Invalid table alias 
or column reference 'marketing_type_group_desc': (possible column names are: 
marketing_type_group, prod_type)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11501)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11449)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11417)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11395)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7761)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9655)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9554)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10450)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10328)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:11011)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:478)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11022)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:285)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:514)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1319)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1459)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1239)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-16019) Query fails when group by/order by on same column with uppercase name

2017-02-22 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-16019:
--

 Summary: Query fails when group by/order by on same column with 
uppercase name
 Key: HIVE-16019
 URL: https://issues.apache.org/jira/browse/HIVE-16019
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Query with group by/order by on same column KEY failed:
{code}
SELECT T1.KEY AS MYKEY FROM SRC T1 GROUP BY T1.KEY ORDER BY T1.KEY LIMIT 3;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-16071) Spark remote driver misuses the timeout in RPC handshake

2017-02-28 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-16071:
--

 Summary: Spark remote driver misuses the timeout in RPC handshake
 Key: HIVE-16071
 URL: https://issues.apache.org/jira/browse/HIVE-16071
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Based on its property description in HiveConf and the comments in HIVE-12650 
(https://issues.apache.org/jira/browse/HIVE-12650?focusedCommentId=15128979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15128979),
 hive.spark.client.connect.timeout is the timeout when the spark remote driver 
makes a socket connection (channel) to RPC server. But currently it is also 
used by the remote driver for RPC client/server handshaking, which is not 
right. Instead, hive.spark.client.server.connect.timeout should be used and it 
has already been used by the RPCServer in the handshaking.
The error like following is usually caused by this issue, since the default 
hive.spark.client.connect.timeout value (1000ms) used by remote driver for 
handshaking is a little too short.
{code}
17/02/20 08:46:08 ERROR yarn.ApplicationMaster: User class threw exception: 
java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: 
Client closed before SASL negotiation finished.
java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: 
Client closed before SASL negotiation finished.
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
at 
org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:156)
at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
Caused by: javax.security.sasl.SaslException: Client closed before SASL 
negotiation finished.
at 
org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(Rpc.java:453)
at 
org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(SaslHandler.java:90)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats

2017-03-08 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-16147:
--

 Summary: Rename a partitioned table should not drop its partition 
columns stats
 Key: HIVE-16147
 URL: https://issues.apache.org/jira/browse/HIVE-16147
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


When a partitioned table (e.g. sample_pt) is renamed (e.g to sample_pt_rename), 
describing its partition shows that the partition column stats are still 
accurate, but actually they all have been dropped.
It could be reproduce as following:
1. analyze table sample_pt compute statistics for columns;
2. describe formatted default.sample_pt partition (dummy = 3):  COLUMN_STATS 
for all columns are true
{code}
...
# Detailed Partition Information 
Partition Value:[3]  
Database:   default  
Table:  sample_pt
CreateTime: Fri Jan 20 15:42:30 EST 2017 
LastAccessTime: UNKNOWN  
Location:   file:/user/hive/warehouse/apache/sample_pt/dummy=3
Partition Parameters:
COLUMN_STATS_ACCURATE   
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
last_modified_byctang   
last_modified_time  1485217063  
numFiles1   
numRows 100 
rawDataSize 5143
totalSize   5243
transient_lastDdlTime   1488842358
... 
{code}
3: describe formatted default.sample_pt partition (dummy = 3) salary: column 
stats exists
{code}
# col_name  data_type   min max 
num_nulls   distinct_count  avg_col_len 
max_col_len num_trues   num_falses  
comment 

 
salary  int 1   151370  
0   94  

from deserializer 
{code}
4. alter table sample_pt rename to sample_pt_rename;
5. describe formatted default.sample_pt_rename partition (dummy = 3): describe 
the rename table partition (dummy =3) shows that COLUMN_STATS for columns are 
still true.
{code}
# Detailed Partition Information 
Partition Value:[3]  
Database:   default  
Table:  sample_pt_rename 
CreateTime: Fri Jan 20 15:42:30 EST 2017 
LastAccessTime: UNKNOWN  
Location:   
file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3
Partition Parameters:
COLUMN_STATS_ACCURATE   
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
last_modified_byctang   
last_modified_time  1485217063  
numFiles1   
numRows 100 
rawDataSize 5143
totalSize   5243
transient_lastDdlTime   1488842358  
{code}
describe formatted default.sample_pt_rename partition (dummy = 3) salary: the 
column stats have been dropped.
{code}
# col_name  data_type   comment 
 

 
salary  int from deserializer   
 
Time taken: 0.131 seconds, Fetched: 3 row(s)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-16189) Table column stats might be invalidated in a failed table rename

2017-03-13 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-16189:
--

 Summary: Table column stats might be invalidated in a failed table 
rename
 Key: HIVE-16189
 URL: https://issues.apache.org/jira/browse/HIVE-16189
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


If the table rename does not succeed due to its failure in moving the data to 
the new renamed table folder, the changes in TAB_COL_STATS are not rolled back 
which leads to invalid column stats.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-16394) HoS does not support queue name change in middle of session

2017-04-05 Thread Chaoyu Tang (JIRA)

Chaoyu Tang created HIVE-16394:
--

 Summary: HoS does not support queue name change in middle of 
session
 Key: HIVE-16394
 URL: https://issues.apache.org/jira/browse/HIVE-16394
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


The mapreduce.job.queuename only effects when HoS executes its query first 
time. After that, changing mapreduce.job.queuename won't change the query yarn 
scheduler queue name.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

< 1 2 3 4 5

401 - 489 of 489 matches

Mail list logo