[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17079445#comment-17079445
 ] 

Peter Vary commented on HIVE-21354:
-----------------------------------

Same for 4.0:
{code:java}
0: jdbc:hive2://localhost:10003> explain locks select * from acid_part;
+----------------------------------------+
|                Explain                 |
+----------------------------------------+
| LOCK INFORMATION:                      |
| default.acid_part -> SHARED_READ       |
| default.acid_part.j=1 -> SHARED_READ   |
| default.acid_part.j=10 -> SHARED_READ  |
| default.acid_part.j=2 -> SHARED_READ   |
+----------------------------------------+ {code}
I think it would be worth to add a new configuration value for the maximum 
number of partition level locks (hive.lock.escalation.num?). So if the number 
of locks is above this level then we should request a table level lock instead 
of partition level lock. Like:
* -1 to turn off lock escalation (default, as this is the backward compatible 
solution)
* 1 to prevent using partition level locks

This configuration should be changed by the user on session level, so if there 
is a long query where it is important to allow as much concurrency as possible 
then the user can set it to -1, and if the session is used for fast queries 
where the latency is more important, then use 1 instead.

The easiest place to implement it would be {{AcidUtils.makeLockComponents}}.

[~belugabehr]: Do you plan to work on this?

Thanks,
Peter
 

> Lock The Entire Table If Majority Of Partitions Are Locked
> ----------------------------------------------------------
>
>                 Key: HIVE-21354
>                 URL: https://issues.apache.org/jira/browse/HIVE-21354
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>    Affects Versions: 4.0.0, 3.2.0
>            Reporter: David Mollitor
>            Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to