[ 
https://issues.apache.org/jira/browse/HIVE-20192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550297#comment-16550297
 ] 

Sankar Hariappan commented on HIVE-20192:
-----------------------------------------

Thanks for the feedback [~vihangk1]!

{quote}

I see that in the initializeHelper method if there is a exception you are 
issuing a shutdown on the ObjectStore to clean up persistenceManager but 
shouldn't this uncaught exception cause the thread to be closed in the first 
place and thereby cleaning up the threadlocal rawstore anyways?

{quote}

- The PersistenceManagerFactory object "pmf" is a static object which keeps 
references of the allocated PersistenceManager in pmCache Map. That's why 
PersistenceManager doesn't get GC'ed and need explicit shutdown for any 
exception. In this case we retry instead of closing the thread which overwrites 
the pm object and leaks the old one.

{quote}

Is it better to issue a shutdown on the threadlocal rawstore from 
{{ThriftBinaryCLIService#deleteContext}} method instead?

{quote}

- That's a good point. But, I'm not sure if there is any reason for keeping the 
current implementation with  threadRawStoreMap.

{quote}

Based on my understanding it looks like we are trying to keep track of the 
threadlocal rawstore using custom implementation of Thread in a map and depend 
on finalize method to do cleanup. This in theory means that cleanup is only 
happening when the threads are GCed instead of doing it as soon as when 
sessions are closed. Also, if a thrift thread is reused there would already be 
an entry in the {{threadRawStoreMap}} and {{cacheThreadLocalRawStore}} will 
overwrite that entry which can also cause a leak. This can potentially be 
verified by keeping the min threads and max threads as equal (so no thread is 
ever GCed) you keep opening and closing connections to HMS, eventually these 
threadLocalRawstore should pile up.

{quote}

- I think, overwriting the entry by cacheThreadLocalRawStore doesn't cause any 
leak, because, it overwrites with thread local rawStore which is active in this 
thread. If the thread local rawStore is changed, it means, the older one was 
already shutdown gracefully before re-create. Also, threadRawStoreMap shouldn't 
pile up as we use the same thread id. 

Please let me know if I miss anything.

> HS2 with embedded metastore is leaking JDOPersistenceManager objects.
> ---------------------------------------------------------------------
>
>                 Key: HIVE-20192
>                 URL: https://issues.apache.org/jira/browse/HIVE-20192
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 3.0.0, 3.1.0, 4.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: HiveServer2, pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-20192.01.patch
>
>
> Hiveserver2 instances where crashing every 3-4 days and observed HS2 in on 
> unresponsive state. Also, observed that the FGC collection happening regularly
> From JXray report it is seen that pmCache(List of JDOPersistenceManager 
> objects) is occupying 84% of the heap and there are around 16,000 references 
> of UDFClassLoader.
> {code:java}
> 10,759,230K (84.7%) Object tree for GC root(s) Java Static 
> org.apache.hadoop.hive.metastore.ObjectStore.pmf
> - org.datanucleus.api.jdo.JDOPersistenceManagerFactory.pmCache ↘ 10,744,419K 
> (84.6%), 1 reference(s)
>   - j.u.Collections$SetFromMap.m ↘ 10,744,419K (84.6%), 1 reference(s)
>     - {java.util.concurrent.ConcurrentHashMap}.keys ↘ 10,743,764K (84.5%), 
> 16,872 reference(s)
>       - org.datanucleus.api.jdo.JDOPersistenceManager.ec ↘ 10,738,831K 
> (84.5%), 16,872 reference(s)
>         ... 3 more references together retaining 4,933K (< 0.1%)
>     - java.util.concurrent.ConcurrentHashMap self 655K (< 0.1%), 1 object(s)
>       ... 2 more references together retaining 48b (< 0.1%)
> - org.datanucleus.api.jdo.JDOPersistenceManagerFactory.nucleusContext ↘ 
> 14,810K (0.1%), 1 reference(s)
> ... 3 more references together retaining 96b (< 0.1%){code}
> When the RawStore object is re-created, it is not allowed to be updated into 
> the ThreadWithGarbageCleanup.threadRawStoreMap which leads to the new 
> RawStore never gets cleaned-up when the thread exit.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to