[ 
https://issues.apache.org/jira/browse/HIVE-10725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546625#comment-14546625
 ] 

Raj Bains commented on HIVE-10725:
----------------------------------

Please consider the following when looking at resource design.

1. Yes, the end goal is to prevent HS2 from becoming unresponsive/going OOM - 
and maybe we add more context to the target here - are we going to target 100 
concurrent sessions on a box with 8 cores and 64GB of RAM with a particular 
query with load balancing across two HS2 instances. This will give us a better 
target and help test our resource management. Operational databases can handle 
thousands of queries per second - though the queries are simpler. HS2 should 
not be a bottleneck.

2. Admission control should be simple, but richer than the number of queries. 
Also, it should be in terms that a database user understands - number of 
threads is a bad metric to expose since it is not in terms of user workload - 
this is punting our engineering work to the end user. To show a relevant 
metric, assume that I setup a HS2 for 100 short concurrent queries and someone 
throws in a 5 fact table join, then what should happen? Admission control 
should allow rules/limits on number of queries AND query attributes. Number of 
tables accessed/ joins might be a good place to start. Admission control 
definitely needs to have an API with the assumption that there will be an 
external Workload Management module / UI that will allow the user to set it up 
along with other aspects of workload management.

3. We'll have to implement parallel compilation at some point, we should have 
guardrails around allowed parallelism and ensure resource management will work 
then. Is the lifetime of a query a pipeline with stages? Can we know how much 
resources the query takes in each stage? Should we allow limits on how many 
queries are allowed in each stage? You might allow 100 threads, but only 5 to 
run CBO at one time. There can be 50 waiting on a stats fetch from remote 
server or on query execution to finish.

4. We'll have caches inside HS2 - statistics cache, query compile cache - how 
will the resource utilization of those be handled? 

5. What's the performance degradation model? A good model is that we allow 
entry of queries till we reach maximum throughput. Then we queue the queries - 
increasing latency but not reducing throughput - degrading gracefully to a 
point. After a certain amount of latency increase, we reject further incoming 
queries from the queue. Is the architecture setup to be able to reason in such 
terms? I haven't looked at the code.



 

> Better resource management in HiveServer2
> -----------------------------------------
>
>                 Key: HIVE-10725
>                 URL: https://issues.apache.org/jira/browse/HIVE-10725
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2, JDBC
>    Affects Versions: 1.3.0
>            Reporter: Vaibhav Gumashta
>
> We have various ways to control the number of queries that can be run on one 
> HS2 instance (max threads, thread pool queuing etc). We also have ways to run 
> multiple HS2 instances using dynamic service discovery. We should do a better 
> job at:
> 1. Monitoring resource utilization (sessions, ophandles, memory, threads etc).
> 2. Being upfront to the client when we cannot accept new queries.
> 3. Throttle among different server instances in case dynamic service 
> discovery is used.
> 4. Consolidate existing ways to control #queries into a simpler model.
> 5. See if we can recommend reasonable values for OS resources or provide 
> alerts if we run out of those.
> 6. Health reports, server status API (to get number of queries, sessions etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to