Re: Review Request 48886: HIVE-14052: Cleanup of structures required when LLAP access from external clients completes

2016-06-30 Thread Siddharth Seth

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48886/#review140309
---



This is getting rather complicated. Need to think through a simpler approach to 
clean up this data. queryComplete does the following
- ObjectCacheFactory.removeLlapQueryCache(savedQueryId);
- QueryInfo object
- Directories on local disk

One possibility (which is similar but can remove the requirement of the Delayed 
cleanup - assuming this is mainly because of the directory deletion ?)
I think it's possible to have the local-dirs created on a per fragment for 
external requests - since Shuffle is not involved (which is what requires all 
fragments for a query to be under the application dir). This cleanup could be 
delinked from queryComplete.
Cleaning up the in-memory structures could be handled immediately after the 
socket is closed - obtain lock, try cleanup (block submissions or block on new 
submissions).


llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 154)


Nothing ever put into this structure.



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 158)


There's a window between moving the state to ACTIVE and taking the lock in 
queryCleanup where there's a race between queryComplete and registerFragment 
obtaining the lock - and will lead to the exception.



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 218)


There's a good chance that this call does nothing - since fragmentComplete 
would already have been invoked by this point.
The Closeable generated earlier holds a reference to QueryFragmentInfo 
which has otherwise been removed.
Think the Closeable generated will get cleaned up when the Socket closes.
Don't think we'll end up accumulating QueryFragmentInfo objects, correct?



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 289)


Would prefer avoiding IO within a lock. Now this lock is for a single query 
only, and will be held at a time when new fragments are unlikely to show up - 
it is still possible for fragments to show up though.
These fragments would go into a tight loop in this case - waiting to get 
this lock since it's been invalidated by this point.



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 352)


Why the null check ? This can go into a tight loop.



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 565)


This belongs inside QueryInfo rather than at the QueryTracker.

Unrelated: Need to move all of these to an interface so that there's good 
sepeartion between methods intended for use by other parts of the system and 
internal methods. Will create a jira and assign to myself.



ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java (line 211)


When is the Writer created ? - once the fragment starts executing, or when 
a read request is received from the client.

What I understand from reading the code here is that it's created when a 
read request is received. If that's the case - there's no guarantees that we'll 
actually end up cleaning the structures that were created for this query.



ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java (line 289)


Can this be added into the FragmentCompletionHandler interface ?



ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java (line 290)


Does it make sense to return a FragmentCompletionHandler rather than a 
Closeable ?



ql/src/test/org/apache/hadoop/hive/llap/TestLlapOutputFormat.java (line 52)


Needs tests.


ObjectCacheFactory.removeLlapQueryCache(savedQueryId);

- Siddharth Seth


On June 17, 2016, 10:31 p.m., Jason Dere wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48886/
> ---
> 
> (Updated June 17, 2016, 10:31 p.m.)
> 
> 
> Review request for hive and Siddharth Seth.
> 
> 
> Bugs: HIVE-14052
> https://issues.apache.org/jira/browse/HIVE-14052
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Add a hook to call run QueryTracker.queryComplete if there a

[jira] [Created] (HIVE-14143) RawDataSize of RCFile is zero after analyze

2016-06-30 Thread Nemon Lou (JIRA)
Nemon Lou created HIVE-14143:


 Summary: RawDataSize of RCFile is zero after analyze 
 Key: HIVE-14143
 URL: https://issues.apache.org/jira/browse/HIVE-14143
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 2.1.0, 1.2.1
Reporter: Nemon Lou
Assignee: Nemon Lou
Priority: Minor


After running the following analyze command ,rawDataSize becomes zero for 
rcfile tables.
{noformat}
 analyze table RCFILE_TABLE compute statistics ;
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14142) java.lang.ClassNotFoundException for the jar in hive.reloadable.aux.jars.path for Hive on Spark

2016-06-30 Thread Aihua Xu (JIRA)
Aihua Xu created HIVE-14142:
---

 Summary: java.lang.ClassNotFoundException for the jar in 
hive.reloadable.aux.jars.path for Hive on Spark
 Key: HIVE-14142
 URL: https://issues.apache.org/jira/browse/HIVE-14142
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 2.2.0
Reporter: Aihua Xu
Assignee: Aihua Xu


Similar to HIVE-14037, seems HOS also has the same issue. The jars in 
hive.reloadable.aux.jars.path are not available during runtime.

{noformat}
java.lang.RuntimeException: Reduce operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.init(SparkReduceRecordHandler.java:232)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:46)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:28)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
xudf.XAdd
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:134)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.isStateful(FunctionRegistry.java:1365)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.isDeterministic(FunctionRegistry.java:1328)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.isDeterministic(ExprNodeGenericFuncEvaluator.java:153)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.iterate(ExprNodeEvaluatorFactory.java:100)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.toCachedEvals(ExprNodeEvaluatorFactory.java:74)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:59)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:406)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.init(SparkReduceRecordHandler.java:217)
... 15 more
Caused by: java.lang.ClassNotFoundException: xudf.XAdd
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132)
... 27 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 48886: HIVE-14052: Cleanup of structures required when LLAP access from external clients completes

2016-06-30 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48886/#review140263
---




llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 103)


is this for all fragments, or only external ones? Should say and handle 
appropriately in get if it's only for external.



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 158)


hmm... was this patch updated? this can throw in many circumstances.
Why do we have to do it like this anyway? It seems that the query stuff 
will repeatedly be cleaned up and recreated.



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 217)


nit



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 232)


can fragments start before the cleanip task runs and cause it to be 
invalid? I think it might make more sense to have delay before even considering 
the cleanup...



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 248)


nit: constant?



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 293)


I don't think wait on Future is a real method, it's just Object::wait; 
get() should be called.

Also timeout might be helpful



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 416)


nit: unneeded?



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
(line 421)


should this be checked during the attempt to get the child locks?


- Sergey Shelukhin


On June 17, 2016, 10:31 p.m., Jason Dere wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48886/
> ---
> 
> (Updated June 17, 2016, 10:31 p.m.)
> 
> 
> Review request for hive and Siddharth Seth.
> 
> 
> Bugs: HIVE-14052
> https://issues.apache.org/jira/browse/HIVE-14052
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Add a hook to call run QueryTracker.queryComplete if there are no more 
> fragments for this query.
> This cleanup runs on delay and can be cancelled if another fragment request 
> comes in with the same query ID.
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
>  ded84c1 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java 
> c7e9d32 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java
>  a965872 
>   ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java 
> 825488f 
>   ql/src/test/org/apache/hadoop/hive/llap/TestLlapOutputFormat.java 2288cd4 
> 
> Diff: https://reviews.apache.org/r/48886/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Dere
> 
>



Re: Review Request 49347: HIVE-14111 better concurrency handling for TezSessionState - part I

2016-06-30 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49347/
---

(Updated June 30, 2016, 10:37 p.m.)


Review request for hive and Siddharth Seth.


Repository: hive-git


Description
---

.


Diffs (updated)
-

  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 0a954fc 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java e154d13 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 
f8f3cad 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java 919b35a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 9e114c0 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java d4051a1 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 
956fd29 
  service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
44463c9 
  service/src/java/org/apache/hive/service/cli/operation/Operation.java d48b92c 
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
9cb5daf 

Diff: https://reviews.apache.org/r/49347/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-14141) Fix for HIVE-14062 breaks indirect urls in beeline

2016-06-30 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-14141:
--

 Summary: Fix for HIVE-14062 breaks indirect urls in beeline
 Key: HIVE-14141
 URL: https://issues.apache.org/jira/browse/HIVE-14141
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar
Priority: Minor


Looks like the patch for HIVE-14062 breaks indirect urls which uses environment 
variables to get the url in beeline

In order to reproduce this issue:

{noformat}
$ export BEELINE_URL_DEFAULT="jdbc:hive2://localhost:1"
$ beeline -u default
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14140) LLAP: package codec jars

2016-06-30 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-14140:
---

 Summary: LLAP: package codec jars
 Key: HIVE-14140
 URL: https://issues.apache.org/jira/browse/HIVE-14140
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 49288: HIVE-11402 HS2 - disallow parallel query execution within a single Session

2016-06-30 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49288/
---

(Updated June 30, 2016, 6 p.m.)


Review request for hive and Thejas Nair.


Repository: hive-git


Description
---

.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8a4ab19 
  
itests/hive-unit/src/test/java/org/apache/hive/service/cli/session/TestHiveSessionImpl.java
 d58a913 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
7341635 
  
service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java
 f7b3412 

Diff: https://reviews.apache.org/r/49288/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-14139) NPE dropping permanent function

2016-06-30 Thread Rui Li (JIRA)
Rui Li created HIVE-14139:
-

 Summary: NPE dropping permanent function
 Key: HIVE-14139
 URL: https://issues.apache.org/jira/browse/HIVE-14139
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


To reproduce:
1. Start a CLI session and create a permanent function.
2. Exit current CLI session.
3. Start a new CLI session and drop the function.

Stack trace:
{noformat}
FAILED: error during drop function: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:513)
at 
org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:501)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunction(FunctionRegistry.java:1532)
at 
org.apache.hadoop.hive.ql.exec.FunctionTask.dropPermanentFunction(FunctionTask.java:228)
at 
org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:95)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1860)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1564)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1316)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1085)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1073)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14138) CBO failed for select current_database()

2016-06-30 Thread Peter Vary (JIRA)
Peter Vary created HIVE-14138:
-

 Summary: CBO failed for select current_database()
 Key: HIVE-14138
 URL: https://issues.apache.org/jira/browse/HIVE-14138
 Project: Hive
  Issue Type: Bug
Reporter: Peter Vary
Priority: Minor


When issuing the following query, with hive.cbo.enable set to true:
select current_database();

The following exception is printed to the Hiveserver2 logs:

2016-06-30T09:58:24,146 ERROR [HiveServer2-Handler-Pool: Thread-33] 
parse.CalcitePlanner: CBO failed, skipping CBO. 
org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: 
Unsupported
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:3136)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:940)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:894)
at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:113)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:969)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:712)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:280)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10795)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:239)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:438)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:329)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1159)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1146)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:191)
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:464)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:451)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:295)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:509)
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)