[jira] [Created] (HIVE-24060) When the CBO is false, NPE is thrown by an EXCEPT or INTERSECT execution

2020-08-24 Thread LuGuangMing (Jira)
LuGuangMing created HIVE-24060:
--

 Summary: When the CBO is false, NPE is thrown by an EXCEPT or 
INTERSECT execution
 Key: HIVE-24060
 URL: https://issues.apache.org/jira/browse/HIVE-24060
 Project: Hive
  Issue Type: Bug
  Components: CBO, Hive
Affects Versions: 3.1.2, 3.1.0
Reporter: LuGuangMing


{code:java}
set hive.cbo.enable=false;
create table testtable(idx string, namex string) stored as orc;
insert into testtable values('123', 'aaa'), ('234', 'bbb');
explain select a.idx from (select idx,namex from testtable intersect select 
idx,namex from testtable) a
{code}
 The execution throws a NullPointException:
{code:java}
2020-08-24 15:12:24,261 | WARN  | HiveServer2-Handler-Pool: Thread-345 | Error 
executing statement:  | 
org.apache.hive.service.cli.thrift.ThriftCLIService.executeNewStatement(ThriftCLIService.java:1155)
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: 
FAILED: NullPointerException null
at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:341)
 ~[hive-service-3.1.0.jar:3.1.0]
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:215)
 ~[hive-service-3.1.0.jar:3.1.0]
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:316)
 ~[hive-service-3.1.0.jar:3.1.0]
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:253) 
~[hive-service-3.1.0.jar:3.1.0]
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:684)
 ~[hive-service-3.1.0.jar:3.1.0]
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:670)
 ~[hive-service-3.1.0.jar:3.1.0]
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:342)
 ~[hive-service-3.1.0.jar:3.1.0]
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.executeNewStatement(ThriftCLIService.java:1144)
 ~[hive-service-3.1.0.jar:3.1.0]
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:1280)
 ~[hive-service-3.1.0.jar:3.1.0]
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
 ~[hive-service-rpc-3.1.0.jar:3.1.0]
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
 ~[hive-service-rpc-3.1.0.jar:3.1.0]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
~[libthrift-0.9.3.jar:0.9.3]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
~[libthrift-0.9.3.jar:0.9.3]
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:648)
 ~[hive-standalone-metastore-3.1.0.jar:3.1.0]
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
 ~[libthrift-0.9.3.jar:0.9.3]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_201]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_201]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4367)
 ~[hive-exec-3.1.0.jar:3.1.0]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4346)
 ~[hive-exec-3.1.0.jar:3.1.0]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10576)
 ~[hive-exec-3.1.0.jar:3.1.0]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10515)
 ~[hive-exec-3.1.0.jar:3.1.0]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11434)
 ~[hive-exec-3.1.0.jar:3.1.0]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11291)
 ~[hive-exec-3.1.0.jar:3.1.0]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11318)
 ~[hive-exec-3.1.0.jar:3.1.0]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11304)
 ~[hive-exec-3.1.0.jar:3.1.0]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12090)
 ~[hive-exec-3.1.0.jar:3.1.0]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12180)
 ~[hive-exec-3.1.0.jar:3.1.0]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11692)
 ~[hive-exec-3.1.0.jar:3.1.0]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnal

[jira] [Created] (HIVE-24061) Improve llap task scheduling for better cache hit rate

2020-08-24 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-24061:
---

 Summary: Improve llap task scheduling for better cache hit rate 
 Key: HIVE-24061
 URL: https://issues.apache.org/jira/browse/HIVE-24061
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


TaskInfo is initialized with the "requestTime and locality delay". When lots of 
vertices are in the same level, "taskInfo" details would be available upfront. 
By the time, it gets to scheduling, "requestTime + localityDelay" won't be 
higher than current time. Due to this, it misses scheduling delay details and 
ends up choosing random node. This ends up missing cache hits and reads data 
from remote storage.

E.g Observed this pattern in Q75 of tpcds.

Related lines of interest in scheduler: 
[https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
 
|https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java]
{code:java}
   boolean shouldDelayForLocality = 
request.shouldDelayForLocality(schedulerAttemptTime);
..
..
boolean shouldDelayForLocality(long schedulerAttemptTime) {
  return localityDelayTimeout > schedulerAttemptTime;
}
{code}
 

Ideally, "localityDelayTimeout" should be adjusted based on it's first 
scheduling opportunity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24062) Combine all table constrains RDBMS calls in one SQL call

2020-08-24 Thread Ashish Sharma (Jira)
Ashish Sharma created HIVE-24062:


 Summary: Combine all table constrains RDBMS calls in one SQL call
 Key: HIVE-24062
 URL: https://issues.apache.org/jira/browse/HIVE-24062
 Project: Hive
  Issue Type: Improvement
Reporter: Ashish Sharma
Assignee: Ashish Sharma


Table consist of 6 different type of constrains namely 
PrimaryKey,ForeignKey,UniqueConstraint,NotNullConstraint,DefaultConstraint,CheckConstraint.
 All constrains has different SQL query to fetch the infromation from RDBMS. 
Which lead to 6 different RDBS call. 

Idea here is to have one complex query which fetch all the constrains 
information at once then filter the result set on the basis of constrains type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24063) SqlFunctionConverter#getHiveUDF handles cast before geting FunctionInfo

2020-08-24 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-24063:
--

 Summary: SqlFunctionConverter#getHiveUDF handles cast before 
geting FunctionInfo
 Key: HIVE-24063
 URL: https://issues.apache.org/jira/browse/HIVE-24063
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Zhihua Deng


When the current SqlOperator is SqlCastFunction, 
FunctionRegistry.getFunctionInfo would return null, 
but when hive.allow.udf.load.on.demand is enabled, HiveServer2 will refer to 
metastore for the function definition,  an exception stack trace can be seen 
here in HiveServer2 log:

INFO exec.FunctionRegistry: Unable to look up default.cast in metastore
org.apache.hadoop.hive.ql.metadata.HiveException: 
NoSuchObjectException(message:Function @hive#default.cast does not exist)
 at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:5495) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfoFromMetastoreNoLock(Registry.java:788)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.Registry.getQualifiedFunctionInfo(Registry.java:657)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.exec.Registry.getFunctionInfo(Registry.java:351) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:597)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.SqlFunctionConverter.getHiveUDF(SqlFunctionConverter.java:158)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:112)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:68)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.PartitionPrune$ExtractPartPruningPredicate.visitCall(PartitionPrune.java:134)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] 
 
So it's may be better to handle explicit cast before geting the FunctionInfo 
from Registry. Even if there is no cast in the query,  the method 
handleExplicitCast returns null quickly when op.kind is not a SqlKind.CAST.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24064) Disable Materialized View Replication

2020-08-24 Thread Arko Sharma (Jira)
Arko Sharma created HIVE-24064:
--

 Summary: Disable Materialized View Replication
 Key: HIVE-24064
 URL: https://issues.apache.org/jira/browse/HIVE-24064
 Project: Hive
  Issue Type: Bug
Reporter: Arko Sharma






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24065) Bloom filters can be cached after deserialization in VectorInBloomFilterColDynamicValue

2020-08-24 Thread Jira
László Bodor created HIVE-24065:
---

 Summary: Bloom filters can be cached after deserialization in 
VectorInBloomFilterColDynamicValue
 Key: HIVE-24065
 URL: https://issues.apache.org/jira/browse/HIVE-24065
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception

2020-08-24 Thread Jainik Vora (Jira)
Jainik Vora created HIVE-24066:
--

 Summary: Hive query on parquet data should identify if column is 
not present in file schema and show NULL value instead of Exception
 Key: HIVE-24066
 URL: https://issues.apache.org/jira/browse/HIVE-24066
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.3.5
Reporter: Jainik Vora


I created a hive table containing columns with struct data type 
 
{code:java}
CREATE EXTERNAL TABLE abc_dwh.table_on_parquet (
  `context` struct<`app`:struct<`build`:string, `name`:string, 
`namespace`:string, `version`:string>, `screen`:struct<`height`:bigint, 
`width`:bigint>, `timezone`:string>,
  `messageid` string,
  `timestamp` string,
  `userid` string)
PARTITIONED BY (year string, month string, day string, hour string)
STORED as PARQUET
LOCATION 's3://abc/xyz';
  {code}
 
All columns are nullable hence the parquet files read by the table don't always 
contain all columns. If any file in a partition doesn't have "context.app" 
struct and if "context.app.version" is queried, Hive throws an exception as 
below. Same for "context.screen" as well.
 
{code:java}
 Caused by: java.io.IOException: java.lang.RuntimeException: Primitive type 
appshould not doesn't match typeapp[version]
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:379)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
... 25 more
Caused by: java.lang.RuntimeException: Primitive type appshould not doesn't 
match typeapp[version]
at 
org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330)
at 
org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322)
at 
org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedSchema(DataWritableReadSupport.java:249)
at 
org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:379)
at 
org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:84)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:75)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:376)
... 26 more
 {code}
 
Querying context.app shows as null
{code:java}
hive> select context.app from abc_dwh.table_on_parquet where year=2020 and 
month='07' and day=26 and hour='03' limit 5;
OK
NULL
NULL
NULL
NULL
NULL
  {code}
 
As a workaround, I tried querying "context.app.version" only if "context.app" 
is not null but that also gave the same error.  *To verify the case statement 
for null check, I ran below query which should produce "0" in result for all 
columns produced "1".*  Distinct value of context.app for the partition is NULL 
so ruled out differences in select with limit. Running the same query in 
SparkSQL provides the correct result. 
{code:java}
hive> select case when context.app is null then 0 else 1 end status from 
abc_dwh.table_on_parquet where year=2020 and month='07' and day=26 and 
hour='03' limit 5;
OK
1
1
1
1
1 {code}
Hive Version used: 2.3.5-amzn-0 (on AWS EMR){color:#88}
{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24067) TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop

2020-08-24 Thread Pravin Sinha (Jira)
Pravin Sinha created HIVE-24067:
---

 Summary: TestReplicationScenariosExclusiveReplica - Wrong FS error 
during DB drop
 Key: HIVE-24067
 URL: https://issues.apache.org/jira/browse/HIVE-24067
 Project: Hive
  Issue Type: Task
Reporter: Pravin Sinha
Assignee: Pravin Sinha


In TestReplicationScenariosExclusiveReplica during drop database operation for 
primary db, it leads to wrong FS error as the ReplChangeManager is associated 
with replica FS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24068) ReExecutionOverlayPlugin can handle DAG submission failures as well

2020-08-24 Thread Prasanth Jayachandran (Jira)
Prasanth Jayachandran created HIVE-24068:


 Summary: ReExecutionOverlayPlugin can handle DAG submission 
failures as well
 Key: HIVE-24068
 URL: https://issues.apache.org/jira/browse/HIVE-24068
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


ReExecutionOverlayPlugin handles cases where there is a vertex failure. DAG 
submission failure can also happen in environments where AM container died 
causing DNS issues. DAG submissions are safe to retry as the DAG hasn't started 
execution yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24069) HiveHistory should log the task that ends abnormally

2020-08-24 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-24069:
--

 Summary: HiveHistory should log the task that ends abnormally
 Key: HIVE-24069
 URL: https://issues.apache.org/jira/browse/HIVE-24069
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Zhihua Deng


When the task returns with the exitVal not equal to 0,  The Executor would skip 
marking the task return code and calling endTask.  This may make the history 
log incomplete for such tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)