Re: ThreadLocal causing memory leak with J2EE applications

Michael McCandless Wed, 10 Sep 2008 08:59:00 -0700


Chris,

After you close your IndexSearcher/Reader, is it possible you're stillholding a reference to it?


Mike

Chris Lu wrote:

Frankly I don't know why TermInfosReader.ThreadResources is notshowing up in the memory snapshot.
Yes. It's been there for a long time. But let's see what's changed :A LRU cache of termInfoCache is added.I SegmentTermEnum previously would be released, since it'srelatively a simple object.But with a cache added to the same class ThreadResources, which holdmany objects, with the threads still hanging around, the cache cannot be released, so in turn the SegmentTermEnum can not be released,so the RAMDirectory can not be released.
My test is too coupled with the software I am working on and noteasy to post here. But here is a similar case from another user:
-----------------------------------------------------------------------------------
i found a forum post from you here [1] where you mention that you
have a memory leak using the lucene ram directory. I'd like to ask you
if you already have resolved the problem and how you did it or maybe
you know where i can read about the solution. We are using
RAMDirectory too and figured out, that over time the memory
consumption raises and raises until the system breaks down but only
when we performing much index updates. if we only create the index and
don't do nothing except searching it, it work fine.
-----------------------------------------------------------------------------------

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!
On Wed, Sep 10, 2008 at 2:45 AM, Michael McCandless <[EMAIL PROTECTED]> wrote:
I still don't quite understand what's causing your memory growth.
SegmentTermEnum insances have been held in a ThreadLocal cache inTermInfosReader for a very long time (at least since Lucene 1.4).
If indeed it's the RAMDir's contents being kept "alive" due to this,then, you should have already been seeing this problem before rev659602. And I still don't get why your reference tree is missingthe TermInfosReader.ThreadResources class.
I'd like to understand the root cause before we hash out possiblesolutions.
Can you post the sources for your load test?

Mike


Chris Lu wrote:
Actually, even I only use one IndexReader, some resources are cachedvia the ThreadLocal cache, and can not be released unless allthreads do the close action.
SegmentTermEnum itself is small, but it holds RAMDirectory along thepath, which is big.
--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!
On Tue, Sep 9, 2008 at 10:43 PM, robert engels<[EMAIL PROTECTED]> wrote:
You do not need a pool of IndexReaders...
It does not matter what class it is, what matters is the class thatultimately holds the reference.
If the IndexReader is never closed, the SegmentReader(s) is neverclosed, so the thread local in TermInfosReader is not cleared(because the thread never dies). So you will get oneSegmentTermEnum, per thread * per segment.
The SegmentTermEnum is not a large object, so even if you had 100threads, and 100 segments, for 10k instances, seems hard to believethat is the source of your memory issue.
The SegmentTermEnum is cached by thread since it needs to enumeratethe terms, not having a per thread cache, would lead to lots ofrandom access when multiple threads read the index - very slow.
You need to keep in mind, what if every thread was executing asearch simultaneously - you would still have 100x100 SegmentTermEnuminstances anyway ! The only way to prevent that would be to createand destroy the SegmentTermEnum on each call (opening and seeking tothe proper spot) - which would be SLOW SLOW SLOW.
On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:
I have tried to create an IndexReader pool and dynamically createsearcher. But the memory leak is the same. It's not related to theSearcher class specifically, but the SegmentTermEnum inTermInfosReader.
--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!
On Tue, Sep 9, 2008 at 10:14 PM, robert engels<[EMAIL PROTECTED]> wrote:A searcher uses an IndexReader - the IndexReader is slow to open,not a Searcher. And searchers can share an IndexReader.
You want to create a single shared (across all threads/users)IndexReader (usually), and create an Searcher as needed anddispose. It is VERY CHEAP to create the Searcher.
I am fairly certain the javadoc on Searcher is incorrect. Thewarning "For performance reasons it is recommended to open only oneIndexSearcher and use it for all of your searches" is not true inthe case where an IndexReader is passed to the ctor.
Any caching should USUALLY be performed at the IndexReader level.
You are most likely using the "path" ctor, and that is the source ofyour problems, as multiple IndexReader instances are being created,and thus the memory use.
On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:
On J2EE environment, usually there is a searcher pool with severalsearchers open.
The speed to opening a large index for every user is not acceptable.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!
On Tue, Sep 9, 2008 at 9:03 PM, robert engels<[EMAIL PROTECTED]> wrote:You need to close the searcher within the thread that is using it,in order to have it cleaned up quickly... usually right after youdisplay the page of results.
If you are keeping multiple searcher refs across multiple threadsfor paging/whatever, you have not coded it correctly.
Imagine 10,000 users - storing a searcher for each one is not goingto work...
On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:
Right, in a sense I can not release it from another thread. Butthat's the problem.
It's a J2EE environment, all threads are kind of equal. It's simplynot possible to iterate through all threads to close the searcher,thus releasing the ThreadLocal cache.Unless Lucene is not recommended for J2EE environment, this has tobe fixed.
--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!
On Tue, Sep 9, 2008 at 8:14 PM, robert engels<[EMAIL PROTECTED]> wrote:Your code is not correct. You cannot release it on another thread -the first thread may creating hundreds/thousands of instances beforethe other thread ever runs...
On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:
If I release it on the thread that's creating the searcher, bysetting searcher=null, everything is fine, the memory is releasedvery cleanly.My load test was to repeatedly create a searcher on a RAMDirectoryand release it on another thread. The test will quickly go to OOMafter several runs. I set the heap size to be 1024M, and theRAMDirectory is of size 250M. Using some profiling tool, the usedsize simply stepped up pretty obviously by 250M.
I think we should not rely on something that's a "maybe" behavior,especially for a general purpose library.
Since it's a multi-threaded env, the thread that's creating theentries in the LRU cache may not go away quickly(actually most, ifnot all, application servers will try to reuse threads), so the LRUcache, which uses thread as the key, can not be released, so theSegmentTermEnum which is in the same class can not be released.
And yes, I close the RAMDirectory, and the fileMap is released. Iverified that through the profiler by directly checking the valuesin the snapshot.
Pretty sure the reference tree wasn't like this using code beforethis commit, because after close the searcher in another thread, theRAMDirectory totally disappeared from the memory snapshot.
--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!
On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless <[EMAIL PROTECTED]> wrote:
Chris Lu wrote:
The problem should be similar to what's talked about on thisdiscussion.
http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
The "rough" conclusion of that thread is that, technically, thisisn't a memory leak but rather a "delayed freeing" problem. Ie, itmay take longer, possibly much longer, than you want for the memoryto be freed.
There is a memory leak for Lucene search from Lucene-1195.(svnr659602, May23,2008)
This patch brings in a ThreadLocal cache to TermInfosReader.
One thing that confuses me: TermInfosReader was already using aThreadLocal to cache the SegmentTermEnum instance. What was addedin this commit (for LUCENE-1195) was an LRU cache storing Term ->TermInfo instances. But it seems like it's the SegmentTermEnuminstance that you're tracing below.
It's usually recommended to keep the reader open, and reuse it when
possible. In a common J2EE application, the http requests are usually
handled by different threads. But since the cache is ThreadLocal,the cacheare not really usable by other threads. What's worse, the cache cannot be
cleared by another thread!
This leak is not so obvious usually. But my case is usingRAMDirectory,having several hundred megabytes. So one un-released resource isobvious to
me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
 |- directory of org.apache.lucene.store.RAMFile
   |- file of org.apache.lucene.store.RAMInputStream
|- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
           |- input of org.apache.lucene.index.SegmentTermEnum
               |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry
So you have a RAMDir that has several hundred MB stored in it, thatyou're done with yet through this path Lucene is keeping it alive?
Did you close the RAMDir? (which will null its fileMap and shouldalso free your memory).
Also, that reference tree doesn't show the ThreadResources classthat was added in that commit -- are you sure this reference treewasn't before the commit?
Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: ThreadLocal causing memory leak with J2EE applications

Reply via email to