[ https://issues.apache.org/jira/browse/HIVE-20192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551100#comment-16551100 ]
Vihang Karajgaonkar commented on HIVE-20192: -------------------------------------------- {quote} The PersistenceManagerFactory object "pmf" is a static object which keeps references of the allocated PersistenceManager in pmCache Map. That's why PersistenceManager doesn't get GC'ed and need explicit shutdown for any exception. In this case we retry instead of closing the thread which overwrites the pm object and leaks the old one. {quote} I see. Thanks for the explanation. {quote}I think, overwriting the entry by cacheThreadLocalRawStore doesn't cause any leak, because, it overwrites with thread local rawStore which is active in this thread. If the thread local rawStore is changed, it means, the older one was already shutdown gracefully before re-create. Also, threadRawStoreMap shouldn't pile up as we use the same thread id. {quote} I think you are right. Looks like the model of cleaning up is optimistic in the sense in case the thread is reused, {{Hive#getInternal}} method does some checks to make sure if we can reuse this threadlocal rawstore and cleans it up in case the owner is different or the config is not compatible. So looks like we are good in case of thread re-use because the object which is being overwritten in the {{ThreadWithGarbageCleanup.threadRawStoreMap}} is either replaced with the same object or when the previous one was closed. So that code path looks good to me. This is all very tricky business and I hope there is no other code path which is still leaking the rawstore. This patch looks good to me. +1 > HS2 with embedded metastore is leaking JDOPersistenceManager objects. > --------------------------------------------------------------------- > > Key: HIVE-20192 > URL: https://issues.apache.org/jira/browse/HIVE-20192 > Project: Hive > Issue Type: Bug > Components: HiveServer2 > Affects Versions: 3.0.0, 3.1.0, 4.0.0 > Reporter: Sankar Hariappan > Assignee: Sankar Hariappan > Priority: Major > Labels: HiveServer2, pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20192.01.patch > > > Hiveserver2 instances where crashing every 3-4 days and observed HS2 in on > unresponsive state. Also, observed that the FGC collection happening regularly > From JXray report it is seen that pmCache(List of JDOPersistenceManager > objects) is occupying 84% of the heap and there are around 16,000 references > of UDFClassLoader. > {code:java} > 10,759,230K (84.7%) Object tree for GC root(s) Java Static > org.apache.hadoop.hive.metastore.ObjectStore.pmf > - org.datanucleus.api.jdo.JDOPersistenceManagerFactory.pmCache ↘ 10,744,419K > (84.6%), 1 reference(s) > - j.u.Collections$SetFromMap.m ↘ 10,744,419K (84.6%), 1 reference(s) > - {java.util.concurrent.ConcurrentHashMap}.keys ↘ 10,743,764K (84.5%), > 16,872 reference(s) > - org.datanucleus.api.jdo.JDOPersistenceManager.ec ↘ 10,738,831K > (84.5%), 16,872 reference(s) > ... 3 more references together retaining 4,933K (< 0.1%) > - java.util.concurrent.ConcurrentHashMap self 655K (< 0.1%), 1 object(s) > ... 2 more references together retaining 48b (< 0.1%) > - org.datanucleus.api.jdo.JDOPersistenceManagerFactory.nucleusContext ↘ > 14,810K (0.1%), 1 reference(s) > ... 3 more references together retaining 96b (< 0.1%){code} > When the RawStore object is re-created, it is not allowed to be updated into > the ThreadWithGarbageCleanup.threadRawStoreMap which leads to the new > RawStore never gets cleaned-up when the thread exit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)