[ 
https://issues.apache.org/jira/browse/HIVE-16844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079001#comment-16079001
 ] 

Sunitha Beeram commented on HIVE-16844:
---------------------------------------

Thanks [~mithun] for the review.

I am still getting familiar with the Hive code base, so do pardon my ignorance 
(and the length of this comment)! Later in this comment I have more notes on 
the connection leak itself, but essentially you are right - we are pulling the 
plug on the connections while another thread is possibly using it.
Also, on closer inspection of the code/comments I came across 
[this|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L8344]:
{quote}
   * The NucleusContext cache gets freed up only on calling a close on it.
   * We're not closing NucleusContext since it does a bunch of other things 
which we don't want.
{quote}

It looks like a decision was made earlier to not close...

I am not sure yet if there is a better way to handle it, however, there are a 
few questions that pop up:
- When the pmf reference is cleared, it seems like the assumption is that the 
earlier connection pool is GC'ed - is that right? It looks like thats not 
occurring in our case. We are using BoneCP(!), but I do recollect trying 
HikariCP as well and the problem was reproducible. However, I did not probe 
further to see if Hikari offers idle connection cleanup etc. My unvalidated 
assumption is that we have min connections in the pool as 1 and with no idle 
connection reclamation, we have active references thats preventing GC...
- Another way of looking at the issue, albeit contrived, is that the connection 
being used by a thread can become invalid if the remote end closes it for 
whatever reason and the application should deal with it gracefully - and the 
connection pool getting closed is a degenerate instance of this. So, if we have 
code similar to what we have in RetryingHMSHandler, it would be handle the 
state invalidation gracefully, right?
- Another issue with not closing the pool is that we now have parts of code 
using different JDO config while the other thread(s) could continue to use 
older config indefinitely. Thats perhaps not desirable either, right?

Early last week I had spent some time looking through the code a bit more, as 
up until then I wasn't sure what execution paths actually led to the leak. I 
believe one possibility is this:
a) Most calls to the Metastore go through RetryingHMSHandler that has the logic 
to retry and reload the configuration in case of errors. So, this code path 
sees new configuration if one is available soon after a DB error.
b) Calls going through DBTokenStore, access RawStore directly and hence not 
only miss out on the retrying logic, but also continue to use old configuration 
through a HiveConf object instantiated at startup.

It looks like in our case, with frequent calls to getDelegationToken, we end up 
with cases where (a) and (b) interleave, resulting in connection pool objects 
getting leaked.

I am just beginning to catch up on the implementation of 
HCatClientHMSImpl/HCatUtil and will need to look through a bit more through 
that code on the JDO configuration that will ultimately get used etc.

I do appreciate any comments/pointers you have regarding this. Also, let me 
know if you think its better to revert the change until this is sorted out.



> Fix Connection leak in ObjectStore when new Conf object is used
> ---------------------------------------------------------------
>
>                 Key: HIVE-16844
>                 URL: https://issues.apache.org/jira/browse/HIVE-16844
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Sunitha Beeram
>            Assignee: Sunitha Beeram
>             Fix For: 3.0.0
>
>         Attachments: HIVE-16844.1.patch
>
>
> The code path in ObjectStore.java currently leaks BoneCP (or Hikari) 
> connection pools when a new configuration object is passed in. The code needs 
> to ensure that the persistence-factory is closed before it is nullified.
> The relevant code is 
> [here|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L290].
>  Note that pmf is set to null, but the underlying connection pool is not 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to