[ https://issues.apache.org/jira/browse/HIVE-16844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079001#comment-16079001 ]
Sunitha Beeram commented on HIVE-16844: --------------------------------------- Thanks [~mithun] for the review. I am still getting familiar with the Hive code base, so do pardon my ignorance (and the length of this comment)! Later in this comment I have more notes on the connection leak itself, but essentially you are right - we are pulling the plug on the connections while another thread is possibly using it. Also, on closer inspection of the code/comments I came across [this|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L8344]: {quote} * The NucleusContext cache gets freed up only on calling a close on it. * We're not closing NucleusContext since it does a bunch of other things which we don't want. {quote} It looks like a decision was made earlier to not close... I am not sure yet if there is a better way to handle it, however, there are a few questions that pop up: - When the pmf reference is cleared, it seems like the assumption is that the earlier connection pool is GC'ed - is that right? It looks like thats not occurring in our case. We are using BoneCP(!), but I do recollect trying HikariCP as well and the problem was reproducible. However, I did not probe further to see if Hikari offers idle connection cleanup etc. My unvalidated assumption is that we have min connections in the pool as 1 and with no idle connection reclamation, we have active references thats preventing GC... - Another way of looking at the issue, albeit contrived, is that the connection being used by a thread can become invalid if the remote end closes it for whatever reason and the application should deal with it gracefully - and the connection pool getting closed is a degenerate instance of this. So, if we have code similar to what we have in RetryingHMSHandler, it would be handle the state invalidation gracefully, right? - Another issue with not closing the pool is that we now have parts of code using different JDO config while the other thread(s) could continue to use older config indefinitely. Thats perhaps not desirable either, right? Early last week I had spent some time looking through the code a bit more, as up until then I wasn't sure what execution paths actually led to the leak. I believe one possibility is this: a) Most calls to the Metastore go through RetryingHMSHandler that has the logic to retry and reload the configuration in case of errors. So, this code path sees new configuration if one is available soon after a DB error. b) Calls going through DBTokenStore, access RawStore directly and hence not only miss out on the retrying logic, but also continue to use old configuration through a HiveConf object instantiated at startup. It looks like in our case, with frequent calls to getDelegationToken, we end up with cases where (a) and (b) interleave, resulting in connection pool objects getting leaked. I am just beginning to catch up on the implementation of HCatClientHMSImpl/HCatUtil and will need to look through a bit more through that code on the JDO configuration that will ultimately get used etc. I do appreciate any comments/pointers you have regarding this. Also, let me know if you think its better to revert the change until this is sorted out. > Fix Connection leak in ObjectStore when new Conf object is used > --------------------------------------------------------------- > > Key: HIVE-16844 > URL: https://issues.apache.org/jira/browse/HIVE-16844 > Project: Hive > Issue Type: Bug > Components: Metastore > Reporter: Sunitha Beeram > Assignee: Sunitha Beeram > Fix For: 3.0.0 > > Attachments: HIVE-16844.1.patch > > > The code path in ObjectStore.java currently leaks BoneCP (or Hikari) > connection pools when a new configuration object is passed in. The code needs > to ensure that the persistence-factory is closed before it is nullified. > The relevant code is > [here|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L290]. > Note that pmf is set to null, but the underlying connection pool is not > closed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)