[jira] [Commented] (HIVE-16334) Query lock contains the query string, which can cause OOM on ZooKeeper

Sahil Takiar (JIRA) Wed, 05 Apr 2017 09:07:04 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957114#comment-15957114
 ]


Sahil Takiar commented on HIVE-16334:
-------------------------------------

[~pvary]

{quote}
The query is stored in HS2, but the locks should be displayed on every client 
even when HS2 is in HA. So if we want to keep the query string in a shared 
database, or another ZooKeeper node next to the lock.
{quote}

I'm not super familiar with HS2 HA so thats entirely possible. However, there 
should be ways to get around that. I would hope that HA supports 
{{TCLIService.GetOperationStatus}}, which should return the query string given 
the operation id.

{quote}
Since the query string is read by humans, I do not think that it is worth to 
display a string which is longer than 1000 chars.
{quote}

Thats probably true, but I can think of a few caveats: (1) {{show locks}} could 
be read via JDBC and its output parsed by some Java code, (2) in production 
clusters, there are often several queries that are very long and very similar 
except for a few changes to constant values.

{quote}
ZooKeeper scales fairly well if we limit the size of the nodes, so considering 
the points above I think it might not worth the complexity to externalize the 
query string from the locks.
{quote}

I think there are some caveats: (1) ZK isn't really designed to store massive 
Hive query strings, (2) in production clusters, ZK may be shared amongst 
multiple distributed systems (HBase, Kafka, etc.), (2) ZK does scale well in 
some ways, but in terms of memory usage it is limited to the RAM of a single 
machine. IMO the less load on ZK the better.

That being said. I still think your approach is a good improvement. All the 
points above can be considered as follow up items. I'll take a look at your RB.

A few more things to think about:
* The description of this JIRA mentions that when there are a high number of 
partitions, there can be increased load on ZK. Does that mean the lock for each 
partition is storing a separate copy of the query string? If so, we should 
definitely fix that - use a reference to the same query string, used string 
interning
* If there was an OOM in ZK, it would be nice to get a heap dump so we can 
analyze it using standard analysis tools

> Query lock contains the query string, which can cause OOM on ZooKeeper
> ----------------------------------------------------------------------
>
>                 Key: HIVE-16334
>                 URL: https://issues.apache.org/jira/browse/HIVE-16334
>             Project: Hive
>          Issue Type: Improvement
>          Components: Locking
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>         Attachments: HIVE-16334.patch
>
>
> When there are big number of partitions in a query this will result in a huge 
> number of locks on ZooKeeper. Since the query object contains the whole query 
> string this might cause serious memory pressure on the ZooKeeper services.
> It would be good to have the possibility to truncate the query strings that 
> are written into the locks



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16334) Query lock contains the query string, which can cause OOM on ZooKeeper

Reply via email to