[ https://issues.apache.org/jira/browse/THRIFT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173896#comment-13173896 ]
Aaron T. Myers commented on THRIFT-1468: ---------------------------------------- bq. On second thoughts, I'm not sure if that's ideal. Remember: we need the WeakHashMap because client-code (of the factory) caches the key, i.e. the underlying TTransport object, and not the value (returned from the factory). I haven't looked at the code in a while, but I'm pretty sure that's not the case. The reason for the hash map is because the TServer implementations don't reuse the same TTransport instance for input and output, even when the same TTransportFactory instance is used for both the input and output factory. This is a requirement for the TSaslTransports. Once TSaslServerTransport#Factory#getTransport is called once, the result is indeed cached by the client-code of the factory - the TServer implementations. So, there will be a hard reference to the value, and that will in turn have a hard reference to the key. I agree with Todd that storing a WeakReference<TSaslServerTransport> should solve the problem. Mithun, would you mind testing this out? > Memory leak in TSaslServerTransport > ----------------------------------- > > Key: THRIFT-1468 > URL: https://issues.apache.org/jira/browse/THRIFT-1468 > Project: Thrift > Issue Type: Bug > Components: Java - Library > Affects Versions: 0.5, 0.9 > Reporter: Mithun Radhakrishnan > Labels: OOM, WeakHashMap, WeakReference > Attachments: Main.java, thrift-1468.patch > > > I'm working on the HCatalog project. HCatalog uses a (slightly dated) version > of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a > continuously running process that serves (meta)data over thrift. (The bug I > describe is related to HCATALOG-183.) > We observed that on running the HCatalog-server with continuous > client-requests, the memory footprint of the server grows steadily, until we > see an OutOfMemoryError exception. I took a memory snapshot of the running > process, to check for leaks. I noticed that the majority of the memory (over > 1.3GB) was being consumed by the > org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There > were over 52000 instances of WeakHashMap$Entry, consuming 3MB of > shallow-heap, and 1.3GB of retained heap. > I suspect that entries in the WeakHashMap (transportMap) are not being > collected during GC, as is expected in code. That would only be so if there > are outstanding hard-references to the key in the map (TTransport). > From the code in TSaslTransport and TSaslServerTransport, it appears that > there is an inadvertent cyclic reference that the runtime is unable to detect: > 1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", > i.e. TTransport. > 2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, > TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the > decorating TSaslServerTransport. > From #2, an entry can only be GCed if there's no outstanding hard-reference > to the TTransport. But from #1, the hard-reference comes from the value-part > of the hashmap entry. The runtime can't deduce that there's a cycle, > presumably because it's not explicit. > (I'll be attaching a sample program to better illustrate the WeakHashMap > behaviour, in case I've botched the explanation above.) > The simple solution would be to change the back-reference in #1 into a > WeakReference. I'll attach a patch here that might be suitable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira