[ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174923#comment-13174923 ]
Mithun Radhakrishnan commented on THRIFT-1468: ---------------------------------------------- I can confirm that the newest patch works. (This patch also has the comments you'd requested for.) Ran overnight, and the latest heap-dumps don't indicate a build-up of WeakHashMap$Entry objects. Would it be possible to get this patch into a build of 0.5.0 (on a maven-repository), and not just 0.9? That'd let Hive and HCatalog take advantage of this fix. > Memory leak in TSaslServerTransport > ----------------------------------- > > Key: THRIFT-1468 > URL: https://issues.apache.org/jira/browse/THRIFT-1468 > Project: Thrift > Issue Type: Bug > Components: Java - Library > Affects Versions: 0.5, 0.9 > Reporter: Mithun Radhakrishnan > Labels: OOM, WeakHashMap, WeakReference > Attachments: Main.java, > THRIFT-1468-Memory_leak_in_TSaslServerTransport.patch, thrift-1468.patch > > > I'm working on the HCatalog project. HCatalog uses a (slightly dated) version > of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a > continuously running process that serves (meta)data over thrift. (The bug I > describe is related to HCATALOG-183.) > We observed that on running the HCatalog-server with continuous > client-requests, the memory footprint of the server grows steadily, until we > see an OutOfMemoryError exception. I took a memory snapshot of the running > process, to check for leaks. I noticed that the majority of the memory (over > 1.3GB) was being consumed by the > org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There > were over 52000 instances of WeakHashMap$Entry, consuming 3MB of > shallow-heap, and 1.3GB of retained heap. > I suspect that entries in the WeakHashMap (transportMap) are not being > collected during GC, as is expected in code. That would only be so if there > are outstanding hard-references to the key in the map (TTransport). > From the code in TSaslTransport and TSaslServerTransport, it appears that > there is an inadvertent cyclic reference that the runtime is unable to detect: > 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", > i.e. TTransport. > 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, > TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the > decorating TSaslServerTransport. > From #2, an entry can only be GCed if there's no outstanding hard-reference > to the TTransport. But from #1, the hard-reference comes from the value-part > of the hashmap entry. The runtime can't deduce that there's a cycle, > presumably because it's not explicit. > (I'll be attaching a sample program to better illustrate the WeakHashMap > behaviour, in case I've botched the explanation above.) > The simple solution would be to change the back-reference in #1 into a > WeakReference. I'll attach a patch here that might be suitable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira